Data Fusion for Audiovisual Speaker Localization: Extending Dynamic   Stream Weights to the Spatial Domain

Julio Wissing; Benedikt Boenninghoff; Dorothea Kolossa; Tsubasa; Ochiai; Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Shoko Araki,; Christopher Schymura

arXiv:2102.11588·cs.SD·February 25, 2021

Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial Domain

Julio Wissing, Benedikt Boenninghoff, Dorothea Kolossa, Tsubasa, Ochiai, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki,, Christopher Schymura

PDF

1 Repo

TL;DR

This paper introduces a neural network-based audiovisual data fusion method that assigns spatially-dependent dynamic weights to improve multi-speaker localization accuracy, especially under challenging conditions.

Contribution

It extends dynamic stream weights to the spatial domain for audiovisual fusion, enhancing robustness in speaker localization tasks.

Findings

01

Outperforms baseline models in localization accuracy

02

Effective in noisy and poorly lit environments

03

Neural network successfully combines audio and visual cues

Abstract

Estimating the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization. Both applications benefit from a known speaker position when, for instance, applying beamforming or assigning unique speaker identities. Recently, several approaches utilizing acoustic signals augmented with visual data have been proposed for this task. However, both the acoustic and the visual modality may be corrupted in specific spatial regions, for instance due to poor lighting conditions or to the presence of background noise. This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions in the localization space. This fusion is achieved via a neural network, which combines the predictions of individual audio and video trackers based on their time- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rub-ksv/spatial-stream-weights
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.