Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and   Benchmark

Ziyang Chen; Israel D. Gebru; Christian Richardt; Anurag Kumar,; William Laney; Andrew Owens; Alexander Richard

arXiv:2403.18821·cs.SD·March 28, 2024·1 cites

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark

Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar,, William Laney, Andrew Owens, Alexander Richard

PDF

Open Access

TL;DR

This paper introduces the Real Acoustic Fields (RAF) dataset, a comprehensive real-world audio-visual room acoustics dataset with high-quality impulse responses, multi-view images, and pose data, enabling improved evaluation and development of neural acoustic models.

Contribution

The paper provides the first densely captured real-world acoustic dataset with multi-modal data, and evaluates existing models, proposing enhancements and a sim2real approach for better real-world performance.

Findings

01

Existing models perform better with real-world data after fine-tuning.

02

Incorporating visual data improves neural acoustic field models.

03

Pre-training on simulated data and fine-tuning enhances few-shot learning.

Abstract

We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthesis and impulse response generation which previously relied on synthetic data. In our evaluation, we thoroughly assessed existing audio and audio-visual models against multiple criteria and proposed settings to enhance their performance on real-world data. We also conducted experiments to investigate the impact of incorporating visual data (i.e., images and depth) into neural acoustic field models. Additionally, we demonstrated the effectiveness of a simple sim2real approach, where a model is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing