Header menu link for other important links
X
Analyzing a selection strategy for data de-duplication in large datasets
, Kirubathangam R.
Published in International Journal of Pharmacy and Technology
2016
Volume: 8
   
Issue: 3
Pages: 16590 - 16595
Abstract
Record de duplication aims at identifying the objects which are potentially replicated in a data repository. Though the concept exists it still continues to receive a significant amount of attention from the database community and researchers due to the intrinsic difficulty in producing a redundant free repository,especially in the context of large datasets. In the case of large scale de duplication,the blocking and classification phases typically rely on the user to configure or tune the process. For instance,the classification phase usually requires a manually tagged set of data. However,selecting and labelling for a defined set is a very costly task which is often restricted to expert users. Some active approaches have been proposed to address this problem by selecting the information associated pairs. © 2016,International Journal of Pharmacy and Technology. All rights reserved.
About the journal
JournalInternational Journal of Pharmacy and Technology
PublisherInternational Journal of Pharmacy and Technology
ISSN0975766X
Open AccessNo