Understanding Lombard speech: a review of compensation techniques towards improving speech based recognition systems

S. Uma Maheswari; A. Shahina; Nayeemulla Khan

doi:10.1007/s10462-020-09907-5

Building voice-based Artificial Intelligence (AI) systems that can efficiently interact with humans through speech has become plausible today due to rapid strides in efficient data-driven AI techniques. Such a human–machine voice interaction in real world would often involve a noisy ambience, where humans tend to speak with additional vocal effort than in a quiet ambience, to mitigate the noise-induced suppression of vocal self-feedback. This noise induced change in the vocal effort is called Lombard speech. In order to build intelligent conversational devices that can operate in a noisy ambience, it is imperative to study the characteristics and processing of Lombard speech. Though the progress of research on Lombard speech started several decades ago, it needs to be explored further in the current scenario which is seeing an explosion of voice-driven applications. The system designed to work with normal speech spoken in a quiet ambience fails to provide the same performance in changing environmental contexts. Different contexts lead to different styles of Lombard speech and hence there arises a need for efficient ways of handling variations in speaking styles in noise. The Lombard speech is also more intelligible than normal speech of a speaker. Applications like public announcement systems with speech output interface should talk with varying degrees of vocal effort to enhance naturalness in a way that humans adapt to speak in noise, in real time. This review article is an attempt to summarize the progress of work on the possible ways of processing Lombard speech to build smart and robust human–machine interactive systems with speech input–output interface, irrespective of operating environmental contexts, for different application needs. This article is a comprehensive review of the studies on Lombard speech, highlighting the key differences observed in acoustic and perceptual analysis of Lombard speech and detailing the Lombard effect compensation methods towards improving the robustness of speech based recognition systems. © 2020, Springer Nature B.V.

Journal	Data powered by TypesetArtificial Intelligence Review
Publisher	Data powered by TypesetSpringer Science and Business Media B.V.
ISSN	02692821