Spatial Audio Rendering for Real-Time Speech Translation in Virtual Meetings
Margarita Geleta, Hong Sodoma, Hannes Gamper

TL;DR
This study demonstrates that spatial audio rendering significantly improves comprehension, engagement, and clarity in real-time multilingual speech translation during virtual meetings, enhancing cross-language communication.
Contribution
It introduces the use of spatial audio cues in real-time speech translation, showing their positive impact on understanding and user experience in virtual meetings.
Findings
Spatial audio doubled comprehension accuracy.
Participants reported higher clarity and engagement with spatial cues.
Spatial rendering improved overall user satisfaction.
Abstract
Language barriers in virtual meetings remain a persistent challenge to global collaboration. Real-time translation offers promise, yet current integrations often neglect perceptual cues. This study investigates how spatial audio rendering of translated speech influences comprehension, cognitive load, and user experience in multilingual meetings. We conducted a within-subjects experiment with 8 bilingual confederates and 47 participants simulating global team meetings with English translations of Greek, Kannada, Mandarin Chinese, and Ukrainian - languages selected for their diversity in grammar, script, and resource availability. Participants experienced four audio conditions: spatial audio with and without background reverberation, and two non-spatial configurations (diotic, monaural). We measured listener comprehension accuracy, workload ratings, satisfaction scores, and qualitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSubtitles and Audiovisual Media · Hearing Loss and Rehabilitation · Virtual Reality Applications and Impacts
