Several cluster analysis techniques have been developed till the present to group objects having similar property or similar characteristics and K-means clustering is one of the most popular statistical clustering techniques proposed by Macqueen [12] in 1967. But this algorithm is unable to handle the categorical data and unable to handle uncertainty as well. But after proposing the rough set theory by Pawlak [15], we have an alternative way of representing sets whose exact boundary cannot be described due to incomplete information. As rough set has been widely used for knowledge representation, hence it can also be applied in classification and very helpful in clustering too. In real life data mining applications we do not have the crisp boundaries for clusters. So, in 2007 and 2009 Parmar et al [14] and Tripathy et al [16] proposed two algorithms MMR and MMeR using rough set theory but these two algorithms have the stability problem due to multiple runs and higher time complexity. In this paper we are proposing a new approach of k-means algorithm using rough set which can handle heterogeneous data and uncertainty as well. © Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2012.

Balakrushna Tripathy

Department of Analytics

School of Computer Science and Engineering

Vellore Campus

Ghosh A

Panda G.K.

Vellore Institute of Technology (VIT) is a private university located in&nbsp;Tamil Nadu, India. Founded in 1984, as Vellore Engineering College, the institution offers 20 undergraduate, 34 postgraduate, four integrated and four research programs. It has campuses in Vellore, Amravati, Bhopal and Chennai.

VIT is one of the top ranked private universities in India according to NIRF, THE and QS Rankings.&nbsp;Govt. of India has recognized&nbsp;VIT, Vellore as an&nbsp;Institution of Eminence. This has allowed VIT to take independent quality initiatives and move up in world ranking.

&nbsp;

&nbsp;

VIT University

Adaptive K-Means Clustering to Handle Heterogeneous Data Using Basic Rough Set Theory

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Advances in Computer Science and Information Technology. Networks and Communications

In this paper, we have implemented a unique form of clustering that takes a non-numeric data set and clusters it with the help of the word embedding provided by the GloVe dataset. The related word embedding are generated for each of the items in the dataset we want to cluster using the GloVe vector representation of those words. We then perform dimensionality reduction on the data set to obtain the accurate number of dimensions to be taken for appropriate cluster formation. The data is then clustered using k-means++. This paper provides one of the ways to overcome the limitation of k-means clustering in terms of initialising the cluster centres and hence gives better quality clusters. With the synthetic examples, the k-means method does not perform well, because the random seeding inevitably merges clusters together, and the algorithm is unable to then split them apart. Careful seeding method used by k-means++ prevents this problem and hence usually gives optimal results even when datasets are synthetic. © 2017 IEEE.

2017 International Conference on Intelligent Sustainable Systems (ICISS)

Implementing GloVe for context based k-means++ clustering

There are several algorithms used for data clustering and as imprecision has become an inherent part of datasets now days, many such algorithms have been developed so far using fuzzy sets, rough sets, intuitionistic fuzzy sets, and their hybrid models. In order to increase the flexibility of conventional rough approximations, a probability based rough sets concept was introduced in the 90s namely decision theoretic rough sets (DTRS). Using this model Li et al. extended the conventional rough c-means. Euclidean distance has been used to measure the similarity among data. As has been observed the Euclidean distance has the property of separability. So, as a solution to that several Kernel distances are used in literature. In fact, we have selected three of the most popular kernels and developed an improved Kernelized rough c-means algorithm. We compare the results with the basic decision theoretic rough c-means. For the comparison we have used three datasets namely Iris, Wine and Glass. The three Kernel functions used are the Radial Basis, the Gaussian, and the hyperbolic tangent. The experimental analysis by using the measuring indices DB and D show improved results for the Kernelized means. We also present various graphs to showcase the clustered data. © Springer Nature Singapore Pte Ltd. 2017.

Advances in Intelligent Systems and Computing Artificial Intelligence and Evolutionary Computations in Engineering Systems

An Analysis of Decision Theoretic Kernalized Rough C-Means

Clustering of real life data for analysis has gained popularity and imprecise methods or their hybrid approaches has attracted many researchers of late. Recently, rough intuitionistic fuzzy c-means algorithm was introduced and studied by Tripathy et al [3] and it was found to be superior to all other algorithms in this family. Kernel based counter part of these algorithms have been found to behave better than their corresponding Euclidean distance based algorithms. Very recently kernel based rough fuzzy algorithm was put forth by Bhargav et al [4]. A comparative analysis over standard datasets and images has established the superiority of this algorithm over its corresponding standard algorithm. In this paper we introduce the kernel based rough intuitionistic fuzzy c-means algorithm and show that it is superior to all the algorithms in the sequel; i.e. both normal and the kernel based algorithms. We establish it through experimental analysis by taking different type of inputs and using standard accuracy measures. © Springer International Publishing Switzerland 2014.

Smart Innovation, Systems and Technologies Advanced Computing, Networking and Informatics- Volume 1

On Kernel Based Rough Intuitionistic Fuzzy C-means Algorithm and a Comparative Analysis

From the beginning of the data analysis system cluster computing plays an important role on it. The very early developed clustering algorithms which can handle only numerical data and K-means clustering is one of them and was proposed by Macqueen [1] in 1967. This algorithm helps us to find the homogeneity of the data set. This K-means algorithm has been modified in many ways to get the modified K-means and kernel based K-means is one of them. It is a nonlinear transformation which transforms the sample data into high dimensional feature space. Though this kernel based K-means performs good almost on every data set but it is unable to handle uncertainty. After rough set theory has been proposed by Pawlak [2], we have many clustering algorithms based on it which can handle uncertainty and heterogeneous data and Rough based K-means is one of them. So in this paper we are proposing the combination of these two methods and known as kernel based K-Means using rough set. © 2012 IEEE.

Journal	Data powered by TypesetLecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Advances in Computer Science and Information Technology. Networks and Communications
Publisher	Data powered by TypesetSpringer Berlin Heidelberg
ISSN	1867-8211
Open Access	0