Header menu link for other important links
X
A parallel ACO algorithm to select terms to categorise longer documents
, K.R. Chandran, A. Karthik, A. Vijay Samuel
Published in Inderscience Publishers
2011
Volume: 6
   
Issue: 4
Pages: 238 - 248
Abstract
Text categorisation (TC) is the task of assigning predefined categories to text. Most of the machine learning algorithms are sensitive to the features fed into it. Feature selection (FS) is an important preprocessing step to remove redundant and irrelevant terms from the training corpus. This paper proposes an ant colony optimisation (ACO) algorithm to select features for categorising longer documents to closely related categories. Heuristic value for each word is computed by the statistical dependency of the term to a category and its compactness value. Experiments were conducted with documents from 20newsgroup and Reuters-21578 benchmarks. The selected features were fed into the Naïve Bayes classifier. It was observed that the performance of the classifier improves with the features selected by the proposed method. The processes involved in algorithm are time intensive and demand parallelism. Hence, the ACO algorithm was parallelised using the MapReduce programming model of Hadoop. © 2011 Inderscience Enterprises Ltd.
About the journal
JournalInternational Journal of Computational Science and Engineering
PublisherInderscience Publishers
ISSN17427185