Mutual Learning for Acoustic Matching and Dereverberation via Visual   Scene-driven Diffusion

Jian Ma; Wenguan Wang; Yi Yang; Feng Zheng

arXiv:2407.10373·cs.SD·July 16, 2024

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

PDF

Open Access

TL;DR

This paper introduces MVSD, a mutual learning diffusion-based framework that jointly improves visual acoustic matching and dereverberation by leveraging their reciprocal relationship and unpaired data, enhancing audio-visual scene consistency.

Contribution

The paper proposes a novel mutual learning framework using diffusion models for joint acoustic matching and dereverberation, overcoming data scarcity and training issues of prior methods.

Findings

01

Improves performance of reverberator and dereverberator

02

Better matching of visual scenarios in audio tasks

03

Effective with unpaired data

Abstract

Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired data. In this paper, we introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks and overcome data scarcity. Furthermore, we employ the diffusion model as foundational conditional converters to circumvent the training instability and over-smoothing drawbacks of conventional GAN architectures. Specifically, MVSD employs two converters: one for VAM called…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization

MethodsDiffusion