Shouted and Normal Speech Classification Using 1D CNN

S. Baghel; M. Bhattacharjee; Prasanna S; P. Guha

doi:10.1007/978-3-030-34872-4_52

Profiles Research Units Publications

Conferences

Shouted and Normal Speech Classification Using 1D CNN

S. Baghel, M. Bhattacharjee, , P. Guha

Published in Springer

2019

DOI: 10.1007/978-3-030-34872-4_52

Volume: 11942 LNCS

Pages: 472 - 480

Abstract

Automatic shouted speech detection systems usually model its spectral characteristics to differentiate it from normal speech. Mostly hand-crafted features have been explored for shouted speech detection. However, many works on audio processing suggest that approaches based on automatic feature learning are more robust than hand-crafted feature engineering. This work re-demonstrates this notion by proposing a 1D-CNN architecture for shouted and normal speech classification task. The CNN learns features from the magnitude spectrum of speech frames. Classification is performed by fully connected layers at later stages of the network. Performance of the proposed architecture is evaluated on three datasets and validated against three existing approaches. As an additional contribution, a discussion of features learned by the CNN kernels is provided with relevant visualizations. © 2019, Springer Nature Switzerland AG.

Topics: Voice activity detection (65)%, Feature engineering (57)%, Feature learning (55)% and Audio signal processing (52)%

View more info for "Shouted and Normal Speech Classification Using 1D CNN"

About the journal

Journal	Data powered by TypesetLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher	Data powered by TypesetSpringer
ISSN	03029743

Authors (1)

Prasanna S

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT

ABOUT US

ACADEMICS

INTERNATIONAL RELATIONS

RESEARCH

RANKINGS & PLACEMENT