NLP-MTFLR: Document-Level Prioritization and Identification of Dominant Multi-word Named Products in Customer Reviews

Sivashankari R; Valarmathi B

doi:10.1007/s13369-017-2773-y

The accessibility to large amount of datasets in commercial domains has accentuated the importance of data mining in the last few years. Practitioners as well as researchers rely on them to reflect on the magnitude and effect of data-related problems that require solution in business environments. In recent years, the volume of online data submissions (e-commerce data) on products, services and organizations has increased exponentially. However, the submitted data are highly unstructured and largely dependent on language. Mining and extracting useful information from such data is a colossal task, as analysis of the data should include opinion word identification/extraction, aspect extraction and entity extraction. Of the three, the entity extraction is one of the governing approaches in text analysis and plays a major role in e-commerce, biomedical and automobile industries and supports the categorization of the records based on the entity names, generation of short summary on the entities and grouping of the similar records. The existing approaches in entity extraction are capable of recognizing and extracting single-word named entities. However, the product names are often given as a sequence of words (multiple words or multi-word named entities) and, therefore, cannot be recognized by the existing methods. To resolve this issue, this paper presents a novel approach of NLP-Modified Token-based Frequencies of Left and Right (NLP-MTFLR), which is considered as an effective approach to detect and extract the multi-word named products and dominant multi-word named product from the customer review corpus. Using this NLP-MTFLR approach, from the review corpus the subwords and multi-subwords are identified and mapped them with its multi-word named products to recognize dominant product of that corpus. With this dominant product identification, the proposed method reveals in that corpus that the identified dominant product is highly reviewed by the reviewers compared to other products. This NLP-MTFLR approach is achieved 97% accuracy, 77% precision, 89% recall and 82% F-score. © 2017, King Fahd University of Petroleum & Minerals.

Journal	Data powered by TypesetArabian Journal for Science and Engineering
Publisher	Data powered by TypesetSpringer Science and Business Media LLC
ISSN	2193-567X
Open Access	No