Record de duplication aims at identifying the objects which are potentially replicated in a data repository. Though the concept exists it still continues to receive a significant amount of attention from the database community and researchers due to the intrinsic difficulty in producing a redundant free repository,especially in the context of large datasets. In the case of large scale de duplication,the blocking and classification phases typically rely on the user to configure or tune the process. For instance,the classification phase usually requires a manually tagged set of data. However,selecting and labelling for a defined set is a very costly task which is often restricted to expert users. Some active approaches have been proposed to address this problem by selecting the information associated pairs. © 2016,International Journal of Pharmacy and Technology. All rights reserved.