Header menu link for other important links
X
Processing Large Text Corpus Using N-Gram Language Modeling and Smoothing
S. Avasthi, R. Chauhan,
Published in Springer Science and Business Media Deutschland GmbH
2021
Volume: 166
   
Pages: 21 - 32
Abstract
The prediction of next word, letter or phrase for the user, while she is typing, is a really valuable tool for improving user experience. The users are communicating, writing reviews and expressing their opinion on such platforms frequently and many times while moving. It has become necessary to provide the user with an application that can reduce typing effort and spelling errors when they have limited time. The text data is getting larger in size due to the extensive use of all kinds of social media platforms and so implementation of text prediction application is difficult considering the size of text data to be processed for language modeling. This research paper’s primary objective is processing large text corpus and implementing a probabilistic model like N-grams to predict the next word when the user provides input. In this exploratory research, n-gram models are discussed and evaluated using Good Turing Estimation, perplexity measure and type-to-token ratio. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About the journal
JournalData powered by TypesetLecture Notes in Networks and Systems
PublisherData powered by TypesetSpringer Science and Business Media Deutschland GmbH
ISSN23673370