Header menu link for other important links
Implementing GloVe for context based k-means++ clustering
Published in IEEE
Pages: 1041 - 1046
In this paper, we have implemented a unique form of clustering that takes a non-numeric data set and clusters it with the help of the word embedding provided by the GloVe dataset. The related word embedding are generated for each of the items in the dataset we want to cluster using the GloVe vector representation of those words. We then perform dimensionality reduction on the data set to obtain the accurate number of dimensions to be taken for appropriate cluster formation. The data is then clustered using k-means++. This paper provides one of the ways to overcome the limitation of k-means clustering in terms of initialising the cluster centres and hence gives better quality clusters. With the synthetic examples, the k-means method does not perform well, because the random seeding inevitably merges clusters together, and the algorithm is unable to then split them apart. Careful seeding method used by k-means++ prevents this problem and hence usually gives optimal results even when datasets are synthetic. © 2017 IEEE.
About the journal
JournalData powered by Typeset2017 International Conference on Intelligent Sustainable Systems (ICISS)
PublisherData powered by TypesetIEEE
Open Access0