Header menu link for other important links
X
Efficient priority queue algorithm and strainer mode technique for identification and eradication of duplications in XML records
, Padmasree, P. Anandhakumar, G. Deepti Raj, T. Rajendran
Published in Institute of Electrical and Electronics Engineers Inc.
2014
Pages: 106 - 113
Abstract
Detecting duplicates in the database is necessary but eradicating those detected duplicates is an important task. Inorder to retrieve valuable data some form of data preprocessing must be executed. Data scrubbing or extracting data with quality is one of the data preprocessing techniques and the most decisive task in this is detecting duplicate records. Databases may contain duplicate record which may be due to data entry errors, standardized abbreviations difference in the schemas of the records. If the database contains duplicate records, it is intricate to examine the database as well as difficult to mine the desired data. In this paper, we identify the duplicate records in bibliographical XML database by using a simple yet an efficient algorithm which uses the structure of a Priority Queue. After this, elimination of duplicate records are carried out using Strainer mode Technique which paves way to maintain a reasonable data quality in database. When compared with the existing method, the proposed method proves to be the best in threshold value having 0.8 as the threshold point for duplicate detection. © 2013 IEEE.