Robust Acoustic Scene Classification in the Presence of Active Foreground Speech
Siyuan Song, Brecht Desplanques, Celest De Moor, Kris Demuynck, Nilesh, Madhu

TL;DR
This paper introduces a robust iVector-based acoustic scene classification system that effectively handles foreground speech interference by using noise-floor features and multi-condition training, significantly improving accuracy in challenging real-world scenarios.
Contribution
The study proposes a novel combination of noise-floor features and multi-condition training for improved acoustic scene classification under foreground speech interference.
Findings
Noise-floor features improve classification accuracy in noisy conditions.
Multi-condition training reduces mismatch between training and testing.
Achieves over 25% accuracy improvement in adverse scenarios.
Abstract
We present an iVector based Acoustic Scene Classification (ASC) system suited for real life settings where active foreground speech can be present. In the proposed system, each recording is represented by a fixed-length iVector that models the recording's important properties. A regularized Gaussian backend classifier with class-specific covariance models is used to extract the relevant acoustic scene information from these iVectors. To alleviate the large performance degradation when a foreground speaker dominates the captured signal, we investigate the use of the iVector framework on Mel-Frequency Cepstral Coefficients (MFCCs) that are derived from an estimate of the noise power spectral density. This noise-floor can be extracted in a statistical manner for single channel recordings. We show that the use of noise-floor features is complementary to multi-condition training in which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
