Learning Audio-Visual Dereverberation

Changan Chen; Wei Sun; David Harwath; Kristen Grauman

arXiv:2106.07732·cs.SD·March 15, 2023·5 cites

Learning Audio-Visual Dereverberation

Changan Chen, Wei Sun, David Harwath, Kristen Grauman

PDF

Open Access 1 Repo

TL;DR

This paper introduces VIDA, a novel audio-visual dereverberation method that leverages visual scene cues to improve speech quality and recognition, supported by a new large-scale dataset and demonstrating state-of-the-art results.

Contribution

The paper presents VIDA, the first end-to-end audio-visual dereverberation approach that utilizes visual scene information, along with a new dataset for realistic acoustic rendering.

Findings

01

VIDA outperforms audio-only methods in speech enhancement and recognition.

02

The approach achieves state-of-the-art performance on simulated and real imagery.

03

Using visual cues significantly improves dereverberation quality.

Abstract

Reverberation not only degrades the quality of speech for human perception, but also severely impacts the accuracy of automatic speech recognition. Prior work attempts to remove reverberation based on the audio modality only. Our idea is to learn to dereverberate speech from audio-visual observations. The visual environment surrounding a human speaker reveals important cues about the room geometry, materials, and speaker location, all of which influence the precise reverberation effects. We introduce Visually-Informed Dereverberation of Audio (VIDA), an end-to-end approach that learns to remove reverberation based on both the observed monaural sound and visual scene. In support of this new task, we develop a large-scale dataset SoundSpaces-Speech that uses realistic acoustic renderings of speech in real-world 3D scans of homes offering a variety of room acoustics. Demonstrating our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/learning-audio-visual-dereverberation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing