Header menu link for other important links
X
Robust Mizo digit recognition using data augmentation and tonal information
B.D. Sarma, A. Dey, W. Lalhminghlui, P. Gogoi, P. Sarmah,
Published in International Speech Communications Association
2018
Volume: 2018-June
   
Pages: 621 - 625
Abstract
Performance of speech recognition system severely degrades in noisy environment. Considering this, in this work, we present a method to improve performance of a Mizo digit recognition system in different noisy conditions using data augmentation and tonal information. Mizo is a tonal language and each digit in Mizo is spoken with one of the four tones present in the language. Therefore, the tone contains information about the spoken digit. Tone is related to the excitation source and excitation source information is robust to noisy conditions when compared with the vocal tract information. Normalized cross correlation function, pitch and pitch dynamics are used as additional features to represent the tonal information and improvement is achieved in Mel frequency cepstral coefficient (MFCC) based baseline systems in noisy conditions. Data augmentation is another technique used in the literature for robust speech recognition. Use of data augmentation further improves the performance of the Mizo digit recognition. © 2018, International Speech Communications Association. All Rights Reserved.
About the journal
JournalProceedings of the International Conference on Speech Prosody
PublisherInternational Speech Communications Association
ISSN23332042