RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios   with Missing Visual Cues

Tianrui Pan; Jie Liu; Bohan Wang; Jie Tang; Gangshan Wu

arXiv:2407.19224·cs.SD·July 31, 2024

RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

Tianrui Pan, Jie Liu, Bohan Wang, Jie Tang, Gangshan Wu

PDF

1 Repo

TL;DR

This paper introduces RAVSS, a novel multi-speaker audio-visual speech separation framework that performs concurrent separation, handles missing visual cues, and achieves state-of-the-art results on multiple datasets.

Contribution

It presents a simultaneous multi-speaker separation method with speaker-wise interactions, improving robustness to missing visual cues and outperforming existing approaches.

Findings

01

Achieves state-of-the-art separation accuracy for 2-5 speakers.

02

Demonstrates robustness to missing or partial visual information.

03

Outperforms existing methods across multiple datasets.

Abstract

While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts across various segments of the separated speech. In this study, we propose a simultaneous multi-speaker separation framework that can facilitate the concurrent separation of multiple speakers within a singular process. We introduce speaker-wise interactions to establish distinctions and correlations among speakers. Experimental results on the VoxCeleb2 and LRS3 datasets demonstrate that our method achieves state-of-the-art performance in separating mixtures with 2, 3, 4, and 5 speakers,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pantianrui/RAVSS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.