Analyzing a selection strategy for data de-duplication in large datasets

Sree Dharinya S; Kirubathangam R.

Profiles Research Units Publications

Articles

Analyzing a selection strategy for data de-duplication in large datasets

, Kirubathangam R.

Published in International Journal of Pharmacy and Technology

2016

Volume: 8

Issue: 3

Pages: 16590 - 16595

Abstract

Record de duplication aims at identifying the objects which are potentially replicated in a data repository. Though the concept exists it still continues to receive a significant amount of attention from the database community and researchers due to the intrinsic difficulty in producing a redundant free repository,especially in the context of large datasets. In the case of large scale de duplication,the blocking and classification phases typically rely on the user to configure or tune the process. For instance,the classification phase usually requires a manually tagged set of data. However,selecting and labelling for a defined set is a very costly task which is often restricted to expert users. Some active approaches have been proposed to address this problem by selecting the information associated pairs. © 2016,International Journal of Pharmacy and Technology. All rights reserved.

About the journal

Journal	International Journal of Pharmacy and Technology
Publisher	International Journal of Pharmacy and Technology
ISSN	0975766X
Open Access	No

Authors (1)

Sree Dharinya S

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT