A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes
Yicheng Hsu, Mingsian R. Bai

TL;DR
This paper introduces a tunable binaural audio telepresence system that balances immersive and enhanced modes, using a novel spatial coherence feature to improve robustness across different microphone array configurations.
Contribution
It proposes a new tunable BAT system with a novel SCORE feature, enabling flexible balancing between immersive and enhanced audio modes.
Findings
Superior performance in diverse array configurations
Robustness of the system across different setups
Effective balancing between immersive and enhanced modes
Abstract
Binaural Audio Telepresence (BAT) aims to encode the acoustic scene at the far end into binaural signals for the user at the near end. BAT encompasses an immense range of applications that can vary between two extreme modes of Immersive BAT (I-BAT) and Enhanced BAT (E-BAT). With I-BAT, our goal is to preserve the full ambience as if we were at the far end, while with E-BAT, our goal is to enhance the far-end conversation with significantly improved speech quality and intelligibility. To this end, this paper presents a tunable BAT system to vary between these two AT modes with a desired application-specific balance. Microphone signals are converted into binaural signals with prescribed ambience factor. A novel Spatial COherence REpresentation (SCORE) is proposed as an input feature for model training so that the network remains robust to different array setups. Experimental results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
