To allow your data warehouse facilitating business analysis, you need to load it regularly.
If the source data is not extracted , cleansed and integrated in the proper formats, it will be difficult perform query processing efficiently which is the ultimate purpose of data warehousing.
- Extraction Process and its Strategies
- Transformation and various tasks performed
- Loading Process and its Strategies
- ETL tools and its features.
Data Extraction:
Data is populated into warehouse from heterogeneous source systems. Hence it is important How you extract and transform the data.
Each data source has its distinct set of characteristics that need to be understood and integrated into the ETL system in order to effectively extract data.
Data Extraction Strategies:
- Full Extraction.
- Partial Extraction - with update notification.
- Partial Extraction - without update notification.
The extract step should be designed in a way that it does not negatively affect the source system in terms or performance, response time or any kind of locking.
Data Transformation:
Data Extracted into a staging server is a raw data and can not be used as it is. It needs to be Cleansed, Mapped and Transformed.
The transformation step also requires joining data from several sources, generating aggregates, generating surrogate keys, sorting, deriving new calculated values, and applying advanced validation rules.
- Selection
- Matching
- Data Cleansing or Enrichment
- Consolidations Or Summarization
Major Transformation Types
- Format Revisions
- Decoding of Fields
- Calculated and Derived Values.
- Splitting of Single Fields.
- Merging of Information.
- Character Set Conversion and Encoding handling.
- Conversion of Units of Measurements
- Date/Time Conversion.
- Summarization.
- Key Restructuring.
- Deduplication.
ETL Tools Features:
- Source and Target Data System Connectivity
- Scalability and Performance
- Easy Transformation connectors
- Data Profiling
- Data Cleaning and Quality
- Easy integration with Web services
- Logging and Exception Handling
- Robust Administration features
- Efficient Batch and Real time processing
Let me know, if you you want me to add any more details or if anything is not correct.
2. Data warehousing Architecture
Also, visit my other articles and videos:
2. Data warehousing Architecture
3. What are Dimension Tables?
No comments:
Post a Comment