Project Description
A custom Data Flow component for SQL Server Integration Services (SSIS) that replaces the standard SCD Wizard with a superior experience, from the configuration UI to runtime performance. Performs 100x faster than the standard component, and edits are non-destructive.
(Not associated or endorsed by Ralph Kimball or the
Kimball Group - but read their books and go take their courses, they're excellent!)
Strengths
- One component on the design surface that can be edited without adverse effects on the rest of the Data Flow. (As opposed to the SCD Wizard that creates multiple components that are destroyed and rebuilt if the Wizard is run again.)
- Insane performance - measured to be 100x superior (see the Performance Improvement?) by use of multiple threads, sort optimization, and implied outcome determination. (As opposed to using the SCD Wizard with single-threaded uncached row-by-row lookups.)
- Surrogate Key management. (As opposed to zero support in the SCD Wizard.)
- "Special" (unknown) member support, per Kimball Method best practices. (As opposed to zero support in the SCD Wizard.)
- Includes a "row change reason" output column on all (except Unchanged) outputs, per Kimball Method best practices. (As opposed to zero support for debugging why a row was directed to a particular output in the SCD Wizard.)
- Supports simple and advanced styles of Row Auditing for inserts and updates, per Kimball Method best practices. (As opposed to zero auditing support in the SCD Wizard.)
- Flexible column comparisons: case (in)sensitive and space (in)sensitive as desired, plus culture-sensitivity. (As opposed to zero support in the SCD Wizard.)
- Flexible SCD 2 "current row" handling - permits specification of the date "endpoints". (As opposed to the SCD Wizard requiring the "expiry" date be NULL only.)
- Flexible SCD 2 date handling - permits specification of what date expired and new rows get marked with. (As opposed to the SCD Wizard leaving that up to a Derived Column component that will get destroyed when the Wizard is run again to add/fix other properties.)
- Reads the existing dimension from the Data Flow, not a Connection Manager - allowing the package designer to cache the existing dimension table as they see fit. (As opposed to the SCD Wizard that only supports some OLE DB Connection Managers.)
- Supports "retiring" dimension members that have been deleted from the Source System (by marking their "expiry date"). (As opposed to zero support for that in the SCD Wizard, because it uses "Add Support for a "Change Stream" Source System Input" style SCD processing. With the Wizard, your dimension table will never be queryable for "how many active things are there?")
Weaknesses