Header menu link for other important links
X
Spoken Keyword Detection using joint DTW-CNN
R. Shankar, C.M. Vikram,
Published in International Speech Communication Association
2018
Volume: 2018-September
   
Pages: 117 - 121
Abstract
A method to detect spoken keywords in a given speech utterance is proposed, called as joint Dynamic Time Warping (DTW)Convolution Neural Network (CNN). It is a combination of DTW approach with a strong classifier like CNN. Both these methods have independently shown significant results in solving problems related to optimal sequence alignment and object recognition, respectively. The proposed method modifies the original DTW formulation and converts the warping matrix into a gray scale image. A CNN is trained on these images to classify the presence or absence of keyword by identifying the texture of warping matrix. The TIMIT corpus has been used for conducting experiments and our method shows significant improvement over other existing techniques. © 2018 International Speech Communication Association. All rights reserved.
About the journal
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association
ISSN2308457X