Processing Large Text Corpus Using N-Gram Language Modeling and Smoothing

S. Avasthi; R. Chauhan; Debi Prasanna Acharjya

doi:10.1007/978-981-15-9689-6_3

Profiles Research Units Publications

Conferences

Processing Large Text Corpus Using N-Gram Language Modeling and Smoothing

S. Avasthi, R. Chauhan,

Published in Springer Science and Business Media Deutschland GmbH

2021

DOI: 10.1007/978-981-15-9689-6_3

Volume: 166

Pages: 21 - 32

Abstract

The prediction of next word, letter or phrase for the user, while she is typing, is a really valuable tool for improving user experience. The users are communicating, writing reviews and expressing their opinion on such platforms frequently and many times while moving. It has become necessary to provide the user with an application that can reduce typing effort and spelling errors when they have limited time. The text data is getting larger in size due to the extensive use of all kinds of social media platforms and so implementation of text prediction application is difficult considering the size of text data to be processed for language modeling. This research paper’s primary objective is processing large text corpus and implementing a probabilistic model like N-grams to predict the next word when the user provides input. In this exploratory research, n-gram models are discussed and evaluated using Good Turing Estimation, perplexity measure and type-to-token ratio. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Topics: Text corpus (58)%, n-gram (58)%, Perplexity (56)%, Language model (55)% and User experience design (54)%

View more info for "Processing Large Text Corpus Using N-Gram Language Modeling and Smoothing"

About the journal

Journal	Data powered by TypesetLecture Notes in Networks and Systems
Publisher	Data powered by TypesetSpringer Science and Business Media Deutschland GmbH
ISSN	23673370

Authors (1)

Debi Prasanna Acharjya

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT