Hearing in a shoe-box : binaural source position and wall absorption estimation using virtually supervised learning
Saurabh Kataria (IIT Kanpur, Panama), Cl\'ement Gaultier (Panama),, Antoine Deleforge (Panama)

TL;DR
This paper presents a virtually-supervised learning framework for binaural sound source localization and wall absorption estimation using simulated acoustic scenes, achieving accurate predictions of source positions and acoustic properties from binaural signals.
Contribution
It introduces a novel virtually-supervised learning approach with an acoustic room simulator to estimate source location and wall absorption from binaural audio.
Findings
Accurately estimates azimuth and elevation of sound sources.
Successfully predicts source range and wall absorption coefficients.
Incorporating diffusion effects enhances estimation accuracy.
Abstract
This paper introduces a new framework for supervised sound source localization referred to as virtually-supervised learning. An acoustic shoe-box room simulator is used to generate a large number of binaural single-source audio scenes. These scenes are used to build a dataset of spatial binaural features annotated with acoustic properties such as the 3D source position and the walls' absorption coefficients. A probabilistic high- to low-dimensional regression framework is used to learn a mapping from these features to the acoustic properties. Results indicate that this mapping successfully estimates the azimuth and elevation of new sources, but also their range and even the walls' absorption coefficients solely based on binaural signals. Results also reveal that incorporating random-diffusion effects in the data significantly improves the estimation of all parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
