Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

TL;DR
This paper introduces MVSD, a mutual learning diffusion-based framework that jointly improves visual acoustic matching and dereverberation by leveraging their reciprocal relationship and unpaired data, enhancing audio-visual scene consistency.
Contribution
The paper proposes a novel mutual learning framework using diffusion models for joint acoustic matching and dereverberation, overcoming data scarcity and training issues of prior methods.
Findings
Improves performance of reverberator and dereverberator
Better matching of visual scenarios in audio tasks
Effective with unpaired data
Abstract
Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired data. In this paper, we introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks and overcome data scarcity. Furthermore, we employ the diffusion model as foundational conditional converters to circumvent the training instability and over-smoothing drawbacks of conventional GAN architectures. Specifically, MVSD employs two converters: one for VAM called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Video Analysis and Summarization
MethodsDiffusion
