MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling

Diwei Huang; Kunyang Lin; Peihao Chen; Qing Du; Mingkui Tan

arXiv:2405.13860·cs.CV·May 24, 2024

MAGIC: Map-Guided Few-Shot Audio-Visual Acoustics Modeling

Diwei Huang, Kunyang Lin, Peihao Chen, Qing Du, Mingkui Tan

PDF

Open Access

TL;DR

This paper introduces a map-guided framework for few-shot audio-visual acoustics modeling, leveraging semantic feature maps and transformer-based encoding to accurately synthesize room impulse responses with limited data.

Contribution

It proposes a novel map-guided approach that constructs semantic feature maps and employs diffusion and transformer models for improved acoustic scene understanding.

Findings

01

Effective in synthesizing RIR with limited observations

02

Outperforms baseline methods on Matterport3D and Replica datasets

03

Demonstrates the importance of semantic maps in acoustic modeling

Abstract

Few-shot audio-visual acoustics modeling seeks to synthesize the room impulse response in arbitrary locations with few-shot observations. To sufficiently exploit the provided few-shot data for accurate acoustic modeling, we present a *map-guided* framework by constructing acoustic-related visual semantic feature maps of the scenes. Visual features preserve semantic details related to sound and maps provide explicit structural regularities of sound propagation, which are valuable for modeling environment acoustics. We thus extract pixel-wise semantic features derived from observations and project them into a top-down map, namely the **observation semantic map**. This map contains the relative positional information among points and the semantic feature information associated with each point. Yet, limited information extracted by few-shot observations on the map is not sufficient for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing