Data Quality in Data Warehouses

Author:

Winkler, William E.

Place:

Hershey

Publisher:

IGI Global

Date published:

2008

Editor:

Wang, John

Journal Title:

Encyclopedia of Data Warehousing and Mining, Second Edition

Source:

Encyclopedia of Data Warehousing and Mining, Second Edition

Abstract:

Fayyad and Uthursamy (2002) have stated that the majority of the work (representing months or years) in creating a data warehouse is in cleaning up duplicates and resolving other anomalies. This paper provides an overview of two methods for improving quality. The first is record linkage for finding duplicates within files or across files. The second is edit/imputation for maintaining business rules and for filling-in missing data. The fastest record linkage methods are suitable for files with hundreds of millions of records (Winkler, 2004a, 2008). The fastest edit/imputation methods are suitable for files with millions of records (Winkler, 2004b, 2007a).

Catagory URL:

http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-60566-0…

DOI:

https://library.au.int/10.4018/978-1-60566-010-3.ch086

CITATION: Winkler, William E.. Data Quality in Data Warehouses edited by Wang, John . Hershey : IGI Global , 2008. Encyclopedia of Data Warehousing and Mining, Second Edition - Available at: https://library.au.int/data-quality-data-warehouses