AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis
Swapnil Bhosale, Haosen Yang, Diptesh Kanojia, Jiankang Deng, Xiatian, Zhu

TL;DR
This paper introduces AV-GS, a novel scene-aware model for synthesizing binaural audio from a single mono source, leveraging explicit geometry and material information for improved realism and efficiency.
Contribution
The paper proposes an explicit point-based scene representation with audio-guidance, enhancing scene understanding for better audio synthesis compared to NeRF-based methods.
Findings
AV-GS outperforms existing methods on real-world and simulated datasets.
The point densification and pruning strategy improves scene representation efficiency.
AV-GS achieves higher audio synthesis quality with better scene characterization.
Abstract
Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability of characterizing the entire scene environment such as room geometry, material properties, and the spatial relation between the listener and sound source. To address these issues, we propose a novel Audio-Visual Gaussian Splatting (AV-GS) model. To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsPruning
