Classification vs. Regression in Supervised Learning for Single Channel   Speaker Count Estimation

Fabian-Robert St\"oter; Soumitro Chakrabarty; Bernd Edler; Emanu\"el; A. P. Habets

arXiv:1712.04555·eess.AS·November 5, 2019

Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

Fabian-Robert St\"oter, Soumitro Chakrabarty, Bernd Edler, Emanu\"el, A. P. Habets

PDF

1 Repo

TL;DR

This paper compares classification and regression approaches in deep neural networks for estimating the number of concurrent speakers from single-channel audio, evaluating different input representations and architectures.

Contribution

It provides an empirical analysis of classification versus regression strategies for speaker count estimation using a Bi-LSTM DNN model.

Findings

01

Classification and regression approaches have different strengths for speaker count estimation.

02

Input representation significantly impacts model performance.

03

The study achieves accurate estimation for mixtures of up to ten speakers.

Abstract

The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification. Building upon powerful machine learning methodology, we develop a Deep Neural Network (DNN) that estimates a speaker count. While DNNs efficiently map input representations to output targets, it remains unclear how to best handle the network output to infer integer source count estimates, as a discrete count estimate can either be tackled as a regression or a classification problem. In this paper, we investigate this important design decision and also address complementary parameter choices such as the input representation. We evaluate a state-of-the-art DNN audio model based on a Bi-directional Long Short-Term Memory network architecture…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

faroit/CountNet
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.