Header menu link for other important links
X
Spoken term detection from continuous speech using ANN posteriors and image processing techniques
R. Shankar, A. Jain, K.T. Deepak, C.M. Vikram,
Published in Institute of Electrical and Electronics Engineers Inc.
2016
Abstract
The objective of current work is to demonstrate the significance of morphological image processing techniques in the spoken term detection from continuous speech. The phone posterior probabilities for the reference speech data and query word are obtained from the Hidden Markov Model (HMM)-Artificial Neural Network (ANN) based hybrid phoneme recognizer. The phone posteriors of query word and reference data are matched by using the non-segmental Dynamic Time Warping (DTW) technique. In order to make the decision about the presence or absence of a keyword in a particular reference file, image processing based approach is proposed. The DTW accumulation matrix is viewed as a gray scale image and processed using binarization and skeletonization operations. The decision about the presence of keyword is taken by observing a diagonal streak of dark patch in the processed image. The phoneme recognizer is trained on the TIMIT training set and a set of twenty randomly chosen words from the TIMIT test data are considered as keywords. The algorithm is evaluated for each keyword against the entire TIMIT test data as the reference and an accuracy of about 85% with an error rate of less than 8% is noted. © 2016 IEEE.
About the journal
JournalData powered by Typeset2016 22nd National Conference on Communication, NCC 2016
PublisherData powered by TypesetInstitute of Electrical and Electronics Engineers Inc.