Audio Inputs for Active Speaker Detection and Localization via   Microphone Array

Davide Berghi; Philip J. B. Jackson

arXiv:2307.14739·eess.AS·July 28, 2023·WASPAA

Audio Inputs for Active Speaker Detection and Localization via Microphone Array

Davide Berghi, Philip J. B. Jackson

PDF

Open Access

TL;DR

This paper investigates the effectiveness of spatial acoustic features derived from multichannel microphone array audio for active speaker detection and localization using a CRNN, analyzing factors like channel number and noise robustness.

Contribution

It compares different spatial features and evaluates their robustness to noise and array configurations for active speaker detection and localization.

Findings

01

GCC-PHAT and SALSA features improve localization accuracy.

02

Performance depends on number of microphones and noise levels.

03

Microphone array configuration impacts detection robustness.

Abstract

This study considers the problem of detecting and locating an active talker's horizontal position from multichannel audio captured by a microphone array. We refer to this as active speaker detection and localization (ASDL). Our goal was to investigate the performance of spatial acoustic features extracted from the multichannel audio as the input of a convolutional recurrent neural network (CRNN), in relation to the number of channels employed and additive noise. To this end, experiments were conducted to compare the generalized cross-correlation with phase transform (GCC-PHAT), the spatial cue-augmented log-spectrogram (SALSA) features, and a recently-proposed beamforming method, evaluating their robustness to various noise intensities. The array aperture and sampling density were tested by taking subsets from the 16-microphone array. Results and tests of statistical significance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis