Header menu link for other important links
X
Classifying text with statistically selected features to closely related categories
, K.R. Chandran
Published in
2009
Pages: 297 - 301
Abstract
Text Classification is continuing to be one of the most researched problems due to continuously-increasing amount of electronic documents and digital data. Classifying documents to closely related categories is the most complex task in text categorization. Feature selection is an essential preprocessing step for improving the efficiency and accuracy of the text classifiers by removing redundant and irrelevant terms from the training corpus. In this paper, a novel feature selection algorithm based on chi-square statistics, have been proposed for Naïve Bayes classifier. The proposed feature selection method not only identifies the related features for a class, but also determines the type of dependency between the feature and category. The performance of the classifier with the features selected by the proposed method and the features selected by conventional chisquare max method are compared for closely related categories. Experiments were conducted with randomly chosen training documents from six closely related categories of 20Newsgroup Benchmarks. Experimental results show that the classifier has better classifying accuracy with positive features selected by the proposed method. © 2009 IEEE.
About the journal
JournalARTCom 2009 - International Conference on Advances in Recent Technologies in Communication and Computing