Strategies for parallelizing KMeans data clustering algorithm

S. Mohanavalli; Jaisakthi S M; C. Aravindan

doi:10.1007/978-3-642-20573-6_76

Profiles Research Units Publications

Conferences

Strategies for parallelizing KMeans data clustering algorithm

S. Mohanavalli, , C. Aravindan

Published in

2011

DOI: 10.1007/978-3-642-20573-6_76

Volume: 147 CCIS

Pages: 427 - 430

Abstract

Data Clustering is a descriptive data mining task of finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups [5]. The motivation behind this research paper is to explore KMeans partitioning algorithm in the currently available parallel architecture using parallel programming models. Parallel KMeans algorithms have been implemented for a shared memory model using OpenMP programming and distributed memory model using MPI programming. A hybrid version of OpenMP in MPI programming also has been experimented. The performance of the parallel algorithms were analysed to compare the speedup obtained and to study the Amdhals effect. The computational time of hybrid method was reduced by 50% compared to MPI and was also more efficient with balanced load. © 2011 Springer-Verlag.