Naïve bayes text classification with statistical data feature selection

Janaki Meena M; K.R. Chandran

Text Classification is enduring to be one of the most researched problems due to continuouslyincreasing amount of electronic documents and digital data. Naïve Bayes b an effective and a simple classifier for data mining tasks, but does not show much satisfactory results in automatic text classification problems due to the high dimensionality of the problem. Feature selection is an essential preprocessing step that improves the efficiency and accuracy of text classification algorithms by removing redundant and irrelevant terms from the training corpus. In this paper, the performance of Naïne Bayes classifier is compared by furnishing two sets of features, selected by chi-square method and CRIR algorithm. Experiments were conducted for randomly selected training sets and the performance of the classifier with words as features was analyzed. The performance of the classifier has been observed to improve with the features selected by CHIR algorithm for the benchmark text corpus: 20Newsgroup and classical Smart data sets.

Journal	Advances in Modelling and Analysis B
ISSN	12404543