Header menu link for other important links
X
Map reduce based bag of phrases representation and distributional features incorporation for text classification
Published in Asian Research Publishing Network
2018
Volume: 13
   
Issue: 11
Pages: 3626 - 3634
Abstract
Text classification is the basis step for developing intelligent information systems such as language identification, biography generation, authorship verification, content filtering, search personalization, product classification, sentiment analysis, detection of malicious activities, patent classification and opinion mining. From early 90's various machine learning approaches have been applied to text classification. Document representation is the process of converting raw documents into a set of features that shall be fed into machine learning algorithms. Features for applying machine learning algorithms to text corpus shall be words, n-grams (phrases) or synsets. Distribution of features in a document is also important for deciding their importance. In this research, a MapReduce based bag of phrases representation is used for classifying text using Naïve Bayes Classifier. The proposed feature selection algorithm is converted to MapReduce programming model and the results are discussed. Precision and recall are metrics that are used in this research to compare the results. It has been observed that bag of phrases representation gives better accuracy for technical documents and including distributional features improves the accuracy of the classifier. © 2006-2018 Asian Research Publishing Network (ARPN).
About the journal
JournalARPN Journal of Engineering and Applied Sciences
PublisherAsian Research Publishing Network
ISSN18196608