Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering

Tianyu Xu; Jihan Li; Penghe Zu; Pranav Sahay; Maruchi Kim; Jack Obeng-Marnu; Farley Miller; Xun Qian; Katrina Passarella; Mahitha Rachumalla; Rajeev Nongpiur; D. Shin

arXiv:2511.11930·cs.HC·November 18, 2025

Enhancing XR Auditory Realism via Multimodal Scene-Aware Acoustic Rendering

Tianyu Xu, Jihan Li, Penghe Zu, Pranav Sahay, Maruchi Kim, Jack Obeng-Marnu, Farley Miller, Xun Qian, Katrina Passarella, Mahitha Rachumalla, Rajeev Nongpiur, D. Shin

PDF

Open Access

TL;DR

This paper presents SAMOSA, an on-device system that dynamically renders spatially accurate sound in XR by fusing multimodal scene data, significantly improving auditory realism and user immersion.

Contribution

SAMOSA introduces a novel multimodal scene-aware acoustic rendering approach that adapts in real-time to physical environments for enhanced XR sound realism.

Findings

01

SAMOSA achieves high accuracy in RIR synthesis across diverse rooms.

02

Expert evaluations confirm improved auditory realism.

03

System operates efficiently on-device in real-time.

Abstract

In Extended Reality (XR), rendering sound that accurately simulates real-world acoustics is pivotal in creating lifelike and believable virtual experiences. However, existing XR spatial audio rendering methods often struggle with real-time adaptation to diverse physical scenes, causing a sensory mismatch between visual and auditory cues that disrupts user immersion. To address this, we introduce SAMOSA, a novel on-device system that renders spatially accurate sound by dynamically adapting to its physical environment. SAMOSA leverages a synergistic multimodal scene representation by fusing real-time estimations of room geometry, surface materials, and semantic-driven acoustic context. This rich representation then enables efficient acoustic calibration via scene priors, allowing the system to synthesize a highly realistic Room Impulse Response (RIR). We validate our system through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Loss and Rehabilitation · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis