Header menu link for other important links
X
Pause insertion in assamese synthesized speech using speech specific features
B. Sharma,
Published in Institute of Electrical and Electronics Engineers Inc.
2017
Abstract
The research in the area of text-to-speech synthesis is going forward to achieve more naturalness in the synthesized speech. Pause prediction from the text to be synthesized plays a vital role in achieving naturalness. From the perspective of speech production, some speech specific features before and after the pause may be coordinated with the pause prediction process. Based on this hypothesis, pattern of features, namely, the modulation spectrum energy, the strength of excitation, and peak-to-dip ratio from smoothed Hilbert envelope of linear prediction residual are analyzed relative to presence or absence of pause, at word junctures from manually pause marked database. While most of the existing works rely only on linguistic aspects for predicting pause position, in this work, support vector machines (SVMs) are trained using both speech based features and linguistic features to predict position of pause at word junctures. The accuracy of pause prediction method is improved to 96.57% by adding speech based evidences, while the prediction accuracy is 90.07% when only semantic features are used. The same SVM classifier is used for pause insertion in the synthesized speech. For this initially speech is synthesized without any pause prediction, from which signal based features are derived at each word juncture. Based on the previously trained classifier output for these features, pauses are inserted in the synthesized speech. Subjective evaluation shows improvement in naturalness and intelligibility of synthesized speech after using proposed pause insertion method. © 2017 IEEE.
About the journal
JournalData powered by Typeset2017 23rd National Conference on Communications, NCC 2017
PublisherData powered by TypesetInstitute of Electrical and Electronics Engineers Inc.