Stereo Audio Rendering for Personal Sound Zones Using a Binaural Spatially Adaptive Neural Network (BSANN)
Hao Jiang, Edgar Choueiri

TL;DR
This paper introduces a binaural neural network framework for personal sound zones that enables independent stereo audio rendering for multiple listeners, significantly improving spatial accuracy, isolation, and robustness in real environments.
Contribution
It presents a novel BSANN-based method integrating acoustic modeling and active crosstalk cancellation for enhanced spatial audio in personal sound zones.
Findings
Improved inter-zone and inter-program isolation metrics.
Enhanced crosstalk cancellation performance.
Greater robustness to room asymmetry.
Abstract
A binaural rendering framework for personal sound zones (PSZs) is proposed to enable multiple head-tracked listeners to receive fully independent stereo audio programs. Current PSZ systems typically rely on monophonic rendering and therefore cannot control the left and right ears separately, which limits the quality and accuracy of spatial imaging. The proposed method employs a Binaural Spatially Adaptive Neural Network (BSANN) to generate ear-optimized loudspeaker filters that reconstruct the desired acoustic field at each ear of multiple listeners. The framework integrates anechoically measured loudspeaker frequency responses, analytically modeled transducer directivity, and rigid-sphere head-related transfer functions (HRTFs) to enhance acoustic accuracy and spatial rendering fidelity. An explicit active crosstalk cancellation (XTC) stage further improves three-dimensional spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Loss and Rehabilitation · Speech and Audio Processing · Advanced Adaptive Filtering Techniques
