Building Audio-Visual Digital Twins with Smartphones
Zitong Lan, Yiwei Tang, Yuhan Wang, Haowen Lai, Yiduo Hao, Mingmin Zhao

TL;DR
This paper presents AV-Twin, a practical system that creates editable audio-visual digital twins of real environments using only smartphones, integrating acoustics and visuals for enhanced spatial realism and interaction.
Contribution
It introduces a novel mobile-based approach combining acoustic and visual modeling to construct fully modifiable digital twins of real-world spaces.
Findings
Successfully reconstructs room acoustics using smartphone-recorded RIRs
Enables real-time editing of materials, geometry, and layout
Automatically updates audio and visual outputs after modifications
Abstract
Digital twins today are almost entirely visual, overlooking acoustics-a core component of spatial realism and interaction. We introduce AV-Twin, the first practical system that constructs editable audio-visual digital twins using only commodity smartphones. AV-Twin combines mobile RIR capture and a visual-assisted acoustic field model to efficiently reconstruct room acoustics. It further recovers per-surface material properties through differentiable acoustic rendering, enabling users to modify materials, geometry, and layout while automatically updating both audio and visuals. Together, these capabilities establish a practical path toward fully modifiable audio-visual digital twins for real-world environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Tactile and Sensory Interactions · Hearing Loss and Rehabilitation
