Ensemble Modeling Forms (DWH Models)

This is a copy/paste post of Hans Hultgren‘s blog post about the forms of Ensemble Modeling:
https://hanshultgren.wordpress.com/2013/06/16/ensemble-modeling-forms/

Ensemble Modeling Forms: Modeling the Agile Data Warehouse

Ensemble-modeling


Anchor Modeling. Data Vault Modeling. Focal Point Modeling. To name a few. In fact there are dozens of data warehouse data modeling patterns that have been introduced over the past decade. Among the top ones there are a set of defining characteristics. These characteristics are combined in the definition of Ensemble modeling forms (AKA Data Warehouse Modeling). See coverage notes on the Next Generation DWH Modeling conference here (and summary here).

The differences between them define the flavors of Ensemble Modeling. These flavors have vastly more in common than they have differences. When compared to 3NF or Dimensional modeling, the defining characteristics of the Ensemble forms have an 80/20 rule of commonality.

All these forms practice Unified Decomposition (breaking concepts into component parts) with a central unique instance as a centerstone (Anchor, Hub, Focal Point, etc.).
Each separates context attributes into dedicated table forms that are directly attached to the centerstone.
Each uncouples relationships from the concepts they seek to relate.
Each purposefully manages historical time-slice data with varying degrees of sophistication concerning temporal variations.
Each recognizes the differences between static and dynamic data.
Each recognizes the reality of working with constantly changing sources, transformation and rules.
Each recognizes the dichotomy of the enterprise-wide natural business key.
From that foundation of commonality, the various forms of Ensembles begin to take on their own flavors.

While Data Vault is foundationally based on the natural business key as the foundation of the centerstone (Hub), both Anchor and Focal Point center on a unique instance of a concept where the business key is strongly coupled but separate from the centerstone (Anchor, Focal Point).
Both Data Vault and Anchor aim to model Ensembles at the Core Business Concept level while Focal Point tends to deploy slightly more abstracted or generic forms of concepts.
Data Vault and Focal Point utilize forms of attribute clusters (logical combinations) for context while Anchor relies on single attribute context tables.
And there are other differentiating factors as well.

There is one thing that we can all agree on: modeling the agile data warehouse means applying some form of Ensemble modeling approach. The specific flavor choice (Data Vault, Anchor, Focal Point, etc.) should be based on the specific characteristics of your data warehouse program.

* Learn more on Anchor Modeling with Lars Rönnbäck here: Anchor Modeling

More info about Ensemble Modeling: https://hanshultgren.wordpress.com/2012/11/20/ensemble-modeling/








Advertisements

SSIS Anchor Modeling example (tutorial)

About
This Anchor Modeling SSIS example is brought to you by:

Clint Huijbers (BumbleBI IT)

Bas van den Berg (C2H)

The first version of the demo was created by Bas and shared on AnchorModeling.com.

Tutorial Video by Bas
Advice you to watch it first 🙂

Tutorial-by-Bas-van-den-Berg


New version of the SSIS package
The fast performing package (final step in his tutorial) was rebuild by Clint and is described below:

Download
To download the SSIS package and T-SQL scripts, click here.


The Anchor Model (AM)
The AM is about telephone calls, that includes two Anchors (Call and Phone), two Ties and three Attributes:

Anchor Model with mnemonics
AM-Model-with-mnemonics

Anchor Model without mnemonics
AM-Model-without-mnemonics


How to install

1) Create a new database called ‘AM_SSIS_Example

2) Create all AM-objects by executing the script: ‘Generated SQL Script by AM Online Modeler.sql

3) Generate source data by executing the script: ‘Create dbo_Calls table (source).sql
Give it a few minutes, it will generate 691.200 rows

The table dbo.Calls is already filled with demo data:
dbo_Calls

4) Create a Batch-table for metadata by executing the script: ‘Create BA_Batch table.sql

The database should now have a couple of tables (also Views, SPs and UDFs):
SQL-Server-Tables

5) Open the SSIS package in BIDS/SSDT and execute the package

The SSIS package demos two scenarios:
#1 – Insert records by using the INSERT-trigger on view dbo.lPhone
#2 – Insert records directly into the Anchor and Attributes
(by using the fast load / bulk insert and parallel processing of the Attributes)

SSIS-Package-Design

Raw File
A Raw File Source is used to temporary store and quickly re-load the full dataset, located in ‘C:\Temp\AM_SSIS_Example_RAW_FILE‘. Important: make sure to manually create/verify this Raw File. You’ll be able to find it here:

SSIS-Raw-File-Connection

Open the ‘Raw File Destination Editor‘-window (right-click and ‘Edit’) and verify the Raw File’s location:

SSIS-Raw-File

6) Watch the show

SSIS-Package-Executed

7) Verify the results by executing the script: ‘Verify the results.sql

AM-data-loaded

8) Increase the volume!
Would you like to re-process all data and perhaps with a lot more records? 😀
Truncate all tables by executing the script: ‘Truncate and delete all data.sql’
Now, drop the table dbo.Calls.
Execute the script ‘Create dbo_Calls table (source).sql‘ again, but this time increase the variable @EndDateTime with just a couple of days.
Have fun!

Coyote