Learning Informatica PowerCenter 9.x

上QQ阅读APP看书，第一时间看更新

SCD1 – I hate history!

To implement SCD1 using wizard, perform the following steps:

In the designer, navigate to Tools | Mapping Designer | Mapping | Wizard | Slowly Changing Dimensions, as shown in the following screenshot:
A new window will pop up, asking you the name (m_SCD1) of the new SCD mapping. Select Type 1 Dimension - keep most recent values in the target, as we are implementing SCD1. Click on Next as follows:
The next screen will ask you to select the source. Select a source from the dropdown. All the sources present in your repository will be listed in this drop-down list. We are using EMP_FILE.txt as the source file for our reference. Also, specify the name of the target you wish to create. We will name the target as EMPLOYEE_SCD1 in this book for our reference. Click on Next.
In the next window, select EMPLOYEE_ID as Logical Key Field. This specifies which column will be used to check for the existence of data in the target. Make sure that the column you use is the Key column of the source. Also, add LOCATION under Fields to compare the changes. This specifies the column for which you wish to maintain the history. Click on Finish.
The wizard creates a complete mapping in your Mapping Designer Workspace. Make necessary changes to the mapping if required. An example of what your mapping could look like is as follows:

Before we proceed further, we need to make some points clear:

As we have used flat file as a source, the Informatica PowerCenter wizard generates the target as a file as well. We cannot maintain SCD on files, so make sure you change the target type to the database. We will be changing this to the Oracle table as a reference. You can do this in the Target Designer. Drag the target (EMPLOYEE_SCD1) created by the wizard in the Target Designer, double-click to open the properties, and change the database type of Oracle. This will change the type of target from the file to Oracle. Once you modify the target table to the Oracle database, the mapping will look like the following figure:
The wizard creates two instances of the same Oracle target table in the mapping. Load the data from the NEW and CHANGED flows, respectively. Understand clearly that these two structures refer to the same Oracle table EMPLOYEE_SCD1. Even though the name (EMPLOYEE_SCD1 and EMPLOYEE_SCD11) is different in the view, when you double-click on the target instances in the Table tab, you can see Table Name as EMPLOYEE_SCD1.

As we are done with the mapping, it's time to analyze it. It is very important to understand each component of the mapping.

The Informatica PowerCenter SCD1 mapping uses a lookup transformation to look up the data in the target table and uses expression transformation to compare the target data with the source data. Based on the comparison, the expression transformation marks a record as a NEW flag or a CHANGED flag. The mapping is divided into two flows:

The FIL_InsertNewRecord filter transformation allows only the NEW record to pass further and filters the records marked as CHANGED from the first flow. It passes new records to UPD_ForceInserts, which inserts these records into the target. The sequence generator generates the primary key for each NEW record.
The FIL_UpdateChangedRecord filter transformation allows only the CHANGED record to pass further and filters the records marked as NEW from the second flow. It passes the changed records to UPD_ChangedUpdate, which replaces existing rows in the target to reflect the latest changes.

Let's understand each transformation that is used in the SCD1 mapping:

The source qualifier (SQ_EMP_FILE): This extracts the data from the file or table you used as the source in the mapping. It passes data to the downstream transformations, that is, lookup, expression, and filter transformation.
Lookup (LKP_GetData): This is used to look up the target table. It caches the existing data from the EMPLOYEE_SCD1 table.
The EMPLOYEE_ID=IN_EMPLOYEE_ID condition in the Condition tab will compare the data with the source table and the target table. Based on the comparison, it passes the required data to the expression transformation.
Expression (EXP_DetectChanges): This receives the data from the upstream transformation, and based on the comparison, it creates two flags, which are NewFlag and ChangedFlag. In our case, we are using the LOCATION field for comparison.
For every record that comes from a source, if there is no matching record in target, we can flag that record as NewFlag; that is, the EMPLOYEE_ID != EMPLOYEE_ID condition signifies NewFlag. If no matching record is present for EMPLOYEE_ID in the target, it signifies that PM_PRIMARYKEY will not be available. So, the lookup transformation will return NULL for the PM_PRIMARYKEY column.
For every record that comes from a source, if there is a matching record in the target and if the location from source does not match the location for a particular EMPLOYEE_ID from the target, we can flag that record as ChangedFlag, that is, EMPLOYEE_ID = EMPLOYEE_ID AND LOCATION != PM_PREV_LOCATION:
- The wizard created the condition for NewFlag as IIF(ISNULL(PM_PRIMARYKEY), TRUE, FALSE)
- The condition for ChangedFlag is IIF(NOT ISNULL(PM_PRIMARYKEY) AND (DECODE(LOCATION,PM_PREV_LOCATION,1,0)=0), TRUE, FALSE)
Based on the condition, it passes the data to downstream filter transformations.
Filter (FIL_InsertNewRecord): This filters the records that come from an upstream expression transformation and are marked as ChangedFlag; it only allows records with NewFlag to get passed to the UPD_ForceInserts update strategy.
Filter (FIL_UpdateChangedRecord): This filters the records that come from an upstream expression transformation and are marked as NewFlag; it only allows records with ChangedFlag to get passed to the UPD_ChangedUpdate update strategy.
Update strategy (UPD_ForceInserts): This uses the DD_INSERT condition to insert data into the target, which is EMPLOYEE_SCD1.
Update strategy (UPD_ChangedUpdate): This uses the DD_UPDATE condition to overwrite the existing LOCATION field into the EMPLOYEE_SCD11 target instance.
Sequence generator (SEQ_GenerateKeys): This generates a sequence of values for each row marked as NewFlag, which is then incrementally loaded into the target by 1. It populates the value into PM_PRIMARYKEY in the EMPLOYEE_SCD1 target instance.
Target (EMPLOYEE_SCD1): This is the target table instance that accepts the NewFlag records into the target table.
Target (EMPLOYEE_SCD11): This is the target table instance that accepts the ChangedFlag records into the target table.