SCD1 – I hate history!
To implement SCD1 using wizard, perform the following steps:
- In the designer, navigate to Tools | Mapping Designer | Mapping | Wizard | Slowly Changing Dimensions, as shown in the following screenshot:
- A new window will pop up, asking you the name (
m_SCD1
) of the new SCD mapping. Select Type 1 Dimension - keep most recent values in the target, as we are implementing SCD1. Click on Next as follows: - The next screen will ask you to select the source. Select a source from the dropdown. All the sources present in your repository will be listed in this drop-down list. We are using
EMP_FILE.txt
as the source file for our reference. Also, specify the name of the target you wish to create. We will name the target asEMPLOYEE_SCD1
in this book for our reference. Click on Next. - In the next window, select EMPLOYEE_ID as Logical Key Field. This specifies which column will be used to check for the existence of data in the target. Make sure that the column you use is the
Key
column of the source. Also, add LOCATION under Fields to compare the changes. This specifies the column for which you wish to maintain the history. Click on Finish. - The wizard creates a complete mapping in your Mapping Designer Workspace. Make necessary changes to the mapping if required. An example of what your mapping could look like is as follows:
Before we proceed further, we need to make some points clear:
- As we have used flat file as a source, the Informatica PowerCenter wizard generates the target as a file as well. We cannot maintain SCD on files, so make sure you change the target type to the database. We will be changing this to the
Oracle
table as a reference. You can do this in the Target Designer. Drag the target (EMPLOYEE_SCD1
) created by the wizard in the Target Designer, double-click to open the properties, and change the database type ofOracle
. This will change the type of target from the file to Oracle. Once you modify the target table to the Oracle database, the mapping will look like the following figure: - The wizard creates two instances of the same Oracle target table in the mapping. Load the data from the
NEW
andCHANGED
flows, respectively. Understand clearly that these two structures refer to the same Oracle tableEMPLOYEE_SCD1
. Even though the name (EMPLOYEE_SCD1
andEMPLOYEE_SCD11
) is different in the view, when you double-click on the target instances in the Table tab, you can see Table Name asEMPLOYEE_SCD1
.
As we are done with the mapping, it's time to analyze it. It is very important to understand each component of the mapping.
The Informatica PowerCenter SCD1 mapping uses a lookup transformation to look up the data in the target table and uses expression transformation to compare the target data with the source data. Based on the comparison, the expression transformation marks a record as a NEW
flag or a CHANGED
flag. The mapping is divided into two flows:
- The
FIL_InsertNewRecord
filter transformation allows only theNEW
record to pass further and filters the records marked asCHANGED
from the first flow. It passes new records toUPD_ForceInserts
, which inserts these records into the target. The sequence generator generates the primary key for eachNEW
record. - The
FIL_UpdateChangedRecord
filter transformation allows only theCHANGED
record to pass further and filters the records marked asNEW
from the second flow. It passes the changed records toUPD_ChangedUpdate
, which replaces existing rows in the target to reflect the latest changes.
Let's understand each transformation that is used in the SCD1 mapping:
- The source qualifier (
SQ_EMP_FILE
): This extracts the data from the file or table you used as the source in the mapping. It passes data to the downstream transformations, that is, lookup, expression, and filter transformation. - Lookup (
LKP_GetData
): This is used to look up the target table. It caches the existing data from theEMPLOYEE_SCD1
table.The
EMPLOYEE_ID=IN_EMPLOYEE_ID
condition in the Condition tab will compare the data with the source table and the target table. Based on the comparison, it passes the required data to the expression transformation. - Expression (
EXP_DetectChanges
): This receives the data from the upstream transformation, and based on the comparison, it creates two flags, which areNewFlag
andChangedFlag
. In our case, we are using theLOCATION
field for comparison.For every record that comes from a source, if there is no matching record in target, we can flag that record as
NewFlag
; that is, theEMPLOYEE_ID != EMPLOYEE_ID
condition signifiesNewFlag
. If no matching record is present forEMPLOYEE_ID
in the target, it signifies thatPM_PRIMARYKEY
will not be available. So, the lookup transformation will returnNULL
for thePM_PRIMARYKEY
column.For every record that comes from a source, if there is a matching record in the target and if the location from source does not match the location for a particular
EMPLOYEE_ID
from the target, we can flag that record asChangedFlag
, that is,EMPLOYEE_ID = EMPLOYEE_ID AND LOCATION != PM_PREV_LOCATION
:- The wizard created the condition for
NewFlag
asIIF(ISNULL(PM_PRIMARYKEY), TRUE, FALSE)
- The condition for
ChangedFlag
isIIF(NOT ISNULL(PM_PRIMARYKEY) AND (DECODE(LOCATION,PM_PREV_LOCATION,1,0)=0), TRUE, FALSE)
Based on the condition, it passes the data to downstream filter transformations.
- The wizard created the condition for
- Filter (
FIL_InsertNewRecord
): This filters the records that come from an upstream expression transformation and are marked asChangedFlag
; it only allows records withNewFlag
to get passed to theUPD_ForceInserts
update strategy. - Filter (
FIL_UpdateChangedRecord
): This filters the records that come from an upstream expression transformation and are marked asNewFlag
; it only allows records withChangedFlag
to get passed to theUPD_ChangedUpdate
update strategy. - Update strategy (
UPD_ForceInserts
): This uses theDD_INSERT
condition to insert data into the target, which isEMPLOYEE_SCD1
. - Update strategy (
UPD_ChangedUpdate
): This uses theDD_UPDATE
condition to overwrite the existingLOCATION
field into theEMPLOYEE_SCD11
target instance. - Sequence generator (
SEQ_GenerateKeys
): This generates a sequence of values for each row marked asNewFlag
, which is then incrementally loaded into the target by 1. It populates the value intoPM_PRIMARYKEY
in theEMPLOYEE_SCD1
target instance. - Target (
EMPLOYEE_SCD1
): This is the target table instance that accepts theNewFlag
records into the target table. - Target (
EMPLOYEE_SCD11
): This is the target table instance that accepts theChangedFlag
records into the target table.