Pause insertion in assamese synthesized speech using speech specific features

B. Sharma; Prasanna S

doi:10.1109/NCC.2017.8077129

Profiles Research Units Publications

Conferences

Pause insertion in assamese synthesized speech using speech specific features

B. Sharma,

Published in Institute of Electrical and Electronics Engineers Inc.

2017

DOI: 10.1109/NCC.2017.8077129

Abstract

The research in the area of text-to-speech synthesis is going forward to achieve more naturalness in the synthesized speech. Pause prediction from the text to be synthesized plays a vital role in achieving naturalness. From the perspective of speech production, some speech specific features before and after the pause may be coordinated with the pause prediction process. Based on this hypothesis, pattern of features, namely, the modulation spectrum energy, the strength of excitation, and peak-to-dip ratio from smoothed Hilbert envelope of linear prediction residual are analyzed relative to presence or absence of pause, at word junctures from manually pause marked database. While most of the existing works rely only on linguistic aspects for predicting pause position, in this work, support vector machines (SVMs) are trained using both speech based features and linguistic features to predict position of pause at word junctures. The accuracy of pause prediction method is improved to 96.57% by adding speech based evidences, while the prediction accuracy is 90.07% when only semantic features are used. The same SVM classifier is used for pause insertion in the synthesized speech. For this initially speech is synthesized without any pause prediction, from which signal based features are derived at each word juncture. Based on the previously trained classifier output for these features, pauses are inserted in the synthesized speech. Subjective evaluation shows improvement in naturalness and intelligibility of synthesized speech after using proposed pause insertion method. © 2017 IEEE.

Topics: Speech processing (64)%, Voice activity detection (59)%, Intelligibility (communication) (57)% and Speech production (55)%

View more info for "Pause insertion in assamese synthesized speech using speech specific features"

About the journal

Journal	Data powered by Typeset2017 23rd National Conference on Communications, NCC 2017
Publisher	Data powered by TypesetInstitute of Electrical and Electronics Engineers Inc.

Authors (1)

Prasanna S

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT