Hearing Anything Anywhere

Mason Wang; Ryosuke Sawata; Samuel Clarke; Ruohan Gao; Shangzhe Wu,; Jiajun Wu

arXiv:2406.07532·cs.SD·June 12, 2024

Hearing Anything Anywhere

Mason Wang, Ryosuke Sawata, Samuel Clarke, Ruohan Gao, Shangzhe Wu,, Jiajun Wu

PDF

Open Access 1 Repo

TL;DR

This paper presents DiffRIR, a novel differentiable framework that reconstructs spatial acoustic properties from sparse recordings, enabling realistic auditory scene synthesis for immersive mixed reality experiences.

Contribution

We introduce DiffRIR, a differentiable RIR rendering model with interpretable parameters, capable of reconstructing acoustic environments from minimal data and synthesizing novel auditory experiences.

Findings

01

Outperforms state-of-the-art in RIR and music rendering

02

Learns physically interpretable acoustic parameters

03

Effective with only sparse recordings and scene reconstructions

Abstract

Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, immersive auditory experiences are equally vital to our holistic perception of an environment. In this paper, we aim to reconstruct the spatial acoustic characteristics of an arbitrary environment given only a sparse set of (roughly 12) room impulse response (RIR) recordings and a planar reconstruction of the scene, a setup that is easily achievable by ordinary users. To this end, we introduce DiffRIR, a differentiable RIR rendering framework with interpretable parametric models of salient acoustic features of the scene, including sound source directivity and surface reflectivity. This allows us to synthesize novel auditory experiences through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maswang32/hearinganythinganywhere
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Loss and Rehabilitation · Music Technology and Sound Studies · Speech and Audio Processing

MethodsSparse Evolutionary Training