1 AIT Asian Institute of Technology

Duplicate record detection for database cleansing

AuthorRehman, Mariam
Call NumberAIT RSPR no.IM-09-01
Subject(s)Database management

NoteA research study submitted in partial fulfillment of the requirements for the degree of Master of Engineering Information Management, School of Engineering and Technology
PublisherAsian Institute of Technology
Series StatementResearch studies project report ; no. IM-09-01
AbstractMany organizations collect large amounts of data to support their business and decision making processes. The data collected from various sources may have data quality problems in it. These kinds of issues become prominent when various databases are integrated. The integrated databases inherit the data quality problems that were present in the source database. The data in the integrated systems need to be cleaned for proper decision making. Cleansing of data is one of the most crucial steps. In this research, focus is on one of the major issue of data cleansing i.e. "duplicate record detection" which arises when the data is collected from various sources. As a result of this research study, comparison among standard duplicate detection algorithm, sorted neighborhood algorithm, duplicate elimination sorted neighborhood algorithm, and adaptive duplicate detection algorithm is provided. A prototype is also developed which shows that adaptive duplicate detection algorithm is the optimal solution for the problem of duplicate record detection
Year2009
Corresponding Series Added EntryAsian Institute of Technology. Research studies project report ; no. IM-09-01
TypeResearch Study Project Report (RSPR)
SchoolSchool of Engineering and Technology (SET)
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSInformation Management (IM)
Chairperson(s)Vatcharaporn Esichaiku;
Examination Committee(s)Vilas Wuwongse;Jenecek, Paul;
Scholarship Donor(s)Higher Education Commission ( HEC), Pakistan;
DegreeResearch Studies Project Report (M.Eng.) - Asian Institute of Technology, 2009


Usage Metrics
View Detail0
Read PDF0
Download PDF0