ETL tools as the components of the Data Warehouse efficiency, 493 words essay example
The Data Warehouse (DW) efficiency and effectiveness are mainly dependent on data extraction, transformation, and loading (ETL) components. Design and implementation of ETL are considered as a supporting task for DW.
ETL refers to the software tools that are devoted to perform the extraction, transformation and loading of data into the data warehouse in an automatic way.
Data Scrubbing (DS), which is one of ETL tools, is used to ensure Data Quality (DQ).
DQ is a significant topic in data warehousing, data mining and information systems. Low data quality will impact the quality of the results and analyses. Also, it will impact the decisions made on the basis of these results and analysis.
A main challenge in the DS process is the existing ambiguousness about the scrubbing decisions that should be taken by the scrubbing algorithms (e.g., deciding whether two records are duplicates or not). Existing data scrubbing systems deal with the ambiguities in data scrubbing decisions by preferring only one option, based on some heuristics, while ignoring all other options, which results in a false sense of removing ambiguities. Generally, recommencement of the DS from scratch is unavoidable whenever we need to incorporate new proofs.
This thesis presents a table of comparison and analysis for DS algorithms (purification data). The measures that are used in the comparison include accuracy and time complexity. Also, the advantages and disadvantages of each algorithm are presented in the aforementioned table of comparison. Additionally, we present a comparison and analysis of the DS frameworks and determine the best framework.
This thesis focuses on two DS problems missing values and duplicate records. Microsoft SQL server and Microsoft SQL analysis service are used in this thesis. Microsoft SQL server provides two key technologies Query Analyzer and Data Transformation Services (DTS).
We have two approaches to solving these problems
In this thesis we proposed two algorithms
After detecting duplicated records, we add temporary table named imagination table.
Move data from the original table to imagination table.
Remove the duplicate records from imagination table instead of the original table.
These are very important steps due to some problems appear if there is more than one relationship between tables.
Forget about stressful night
With our academic essay writing service