Header menu link for other important links
Clustering of mixed datasets using deep learning algorithm
Balaji K., Lavanya K.,
Published in Elsevier BV

The performance of a clustering algorithm is highly dependent on the quality and quantity of the training dataset. Deep learning is one of the most popular and successful technique for clustering of datasets with high quality. Typically, most of the datasets contain mixed numeric and categorical data attributes. The clustering of such different types of data is a complex issue. Deep learning methods, the state-of-the-art classifiers, with better learning procedures and computational resources, can fill these gaps. To improve the robustness of clusters, we propose a Constraint-Based Deep Convolutional Generative Adversarial Network (CB-DCGANs) framework for generating simulated data to augment the training set to improve the performance of the clustering algorithm. We evaluated the performance of an end-to-end Deep Convolutional Neural Network (DCNN) in detecting the clusters from given datasets. The results from CB-DCGANs with DCNN yielded baseline accuracies of 0.8853 for heart disease dataset. In chemoinformatics datasets proposed algorithm yielded accuracies of 0.965 for kaggle dataset, 0.987 for factors dataset, 0.952 for kinase dataset. This study shows that using generative adversarial networks for clustering augmentation can significantly improve performance, especially in real-life applications.

About the journal
JournalData powered by TypesetChemometrics and Intelligent Laboratory Systems
PublisherData powered by TypesetElsevier BV
Open AccessNo