Robust Acoustic Scene Classification in the Presence of Active   Foreground Speech

Siyuan Song; Brecht Desplanques; Celest De Moor; Kris Demuynck; Nilesh; Madhu

arXiv:2108.00912·eess.AS·August 3, 2021

Robust Acoustic Scene Classification in the Presence of Active Foreground Speech

Siyuan Song, Brecht Desplanques, Celest De Moor, Kris Demuynck, Nilesh, Madhu

PDF

Open Access

TL;DR

This paper introduces a robust iVector-based acoustic scene classification system that effectively handles foreground speech interference by using noise-floor features and multi-condition training, significantly improving accuracy in challenging real-world scenarios.

Contribution

The study proposes a novel combination of noise-floor features and multi-condition training for improved acoustic scene classification under foreground speech interference.

Findings

01

Noise-floor features improve classification accuracy in noisy conditions.

02

Multi-condition training reduces mismatch between training and testing.

03

Achieves over 25% accuracy improvement in adverse scenarios.

Abstract

We present an iVector based Acoustic Scene Classification (ASC) system suited for real life settings where active foreground speech can be present. In the proposed system, each recording is represented by a fixed-length iVector that models the recording's important properties. A regularized Gaussian backend classifier with class-specific covariance models is used to extract the relevant acoustic scene information from these iVectors. To alleviate the large performance degradation when a foreground speaker dominates the captured signal, we investigate the use of the iVector framework on Mel-Frequency Cepstral Coefficients (MFCCs) that are derived from an estimate of the noise power spectral density. This noise-floor can be extracted in a statistical manner for single channel recordings. We show that the use of noise-floor features is complementary to multi-condition training in which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis