Sunday 18 October 2015

3. What is ETL | Extract Transformation and Load | Data Warehouse Tutorial

To allow your data warehouse facilitating business analysis, you need to load it regularly.

If the source data is not extracted , cleansed and integrated in the proper formats, it will be difficult perform query processing  efficiently which is the ultimate purpose of data warehousing. 

This video aims to provide an overview of ETL(Extract Load Transformation ) process and covers:

  • Extraction Process and its Strategies
  • Transformation and various tasks performed
  • Loading Process and its Strategies
  • ETL tools and its features.

Data Extraction:

Data is populated into warehouse from heterogeneous source systems. Hence it is important How you extract and transform the data.

Each data source has its distinct set of characteristics that need to be understood and integrated into the ETL system in order to effectively extract data.

Data Extraction Strategies:

  • Full Extraction.
  • Partial Extraction - with update notification.
  • Partial Extraction - without update notification.

The extract step should be designed in a way that it does not negatively affect the source system in terms or performance, response time or any kind of locking.

Data Transformation:

Data Extracted into a staging server is a raw data and can not be used as it is. It needs to be Cleansed, Mapped and Transformed.

The transformation step also requires joining data from several sources, generating aggregates, generating surrogate keys, sorting, deriving new calculated values, and applying advanced validation rules.

Basic task in Data Transformations

  • Selection
  • Matching
  • Data Cleansing or Enrichment
  • Consolidations Or Summarization 

Major Transformation Types

  • Format Revisions
  • Decoding of Fields
  • Calculated and Derived Values.
  • Splitting of Single Fields.
  • Merging of Information.
  • Character Set Conversion and Encoding handling.
  • Conversion of Units of Measurements
  • Date/Time Conversion.
  • Summarization.
  • Key Restructuring.
  • Deduplication.

ETL Tools: Talend Open Studio, Jaspersoft ETL, Ab initio, Informatica, Datastage, Clover ETL, Pentaho ETL, Kettle

ETL Tools Features:

  • Source and Target Data System Connectivity
  • Scalability and Performance
  • Easy Transformation connectors
  • Data Profiling
  • Data Cleaning and Quality
  • Easy integration with Web services
  • Logging and Exception Handling
  • Robust Administration features
  • Efficient Batch and Real time processing

Let me know, if you you want me to add any more details or if anything is not correct.

Also, visit my other articles and videos:

2. Data warehousing Architecture

3. What are Dimension Tables?

No comments:

Post a Comment