Header menu link for other important links
X
Generative Model Driven Representation Learning in a Hybrid Framework for Environmental Audio Scene and Sound Event Recognition
S. Chandrakala,
Published in Institute of Electrical and Electronics Engineers Inc.
2020
Volume: 22
   
Issue: 1
Pages: 3 - 14
Abstract
The analysis of sound information is helpful for audio surveillance, multimedia information retrieval, audio tagging, and forensic applications. Environmental audio scene recognition (EASR) and sound event recognition (SER) for audio surveillance are challenging tasks due to the presence of multiple sound sources, background noises, and the existence of overlapping or polyphonic contexts. We focus on learning robust and compact representations for environmental audio scenes and sound events using mel-frequency cepstral coefficients as basic features, which have proved to be effective in speech and audio-related tasks. In this paper, we propose a common hybrid model-based framework that learns representations with the help of generative models. We explore instance-specific adapted Gaussian mixture models for environmental audio scenes and instance-specific hidden Markov models for sound events to compute a robust, compact, and discriminatory representations. A discriminative model based classifier is then used to recognize these representations as environmental audio scenes and sound events. The performance of the proposed approaches is evaluated using the DCASE2013 scene dataset and TUT-DCASE2016 scene dataset for EASR task. Environmental Sound Classification (ESC-10) and UrbanSound8K datasets are used for SER task. The recognition accuracy of the proposed framework is significantly better than many of the state-of-the-art approaches proposed in the recent literature. The discriminative nature of the model-driven representations leads to improved efficiency for EASR and SER task. The proposed approaches are more suitable for tasks with less training data. © 1999-2012 IEEE.
About the journal
JournalData powered by TypesetIEEE Transactions on Multimedia
PublisherData powered by TypesetInstitute of Electrical and Electronics Engineers Inc.
ISSN15209210