Header menu link for other important links
X
A precise distance metric for mixed data clustering using chi-square statistics
S. Mohanavalli,
Published in Maxwell Science Publications
2015
Volume: 10
   
Issue: 12
Pages: 1441 - 1444
Abstract
In today's scenario, data is available as a mix of numerical and categorical values. Traditional data clustering algorithms perform well for numerical data but produce poor clustering results for mixed data. For better partitioning, the distance metric used should be capable of discriminating the data points with mixed attributes. The distance measure should appropriately balance the categorical distance as well as numerical distance. In this study we have proposed a chi square based statistical approach to determine the weight of the attributes. This weight vector is used to derive the distance matrix of the mixed dataset. The distance matrix is used to cluster the data points using the traditional clustering algorithms. Experiments have been carried out using the UCI benchmark datasets, heart, credit and vote. Apart from these data sets we have also tested our proposed method using a real time bank data set. The accuracy of the clustering results obtained are better than those of the existing works. © Maxwell Scientific Organization, 2015.
About the journal
JournalResearch Journal of Applied Sciences, Engineering and Technology
PublisherMaxwell Science Publications
ISSN20407459