Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features
Hanyu Meng, Jeroen Breebaart, Jeremy Stoddard, Vidhyasaharan Sethu,, Eliathamby Ambikairajah

TL;DR
This paper presents a novel spectro-spatial covariance feature and a deep learning framework for blind estimation of acoustic parameters from Ambisonics recordings, significantly improving accuracy over existing methods.
Contribution
It introduces the Spectro-Spatial Covariance Vector (SSCV) feature and FOA-Conv3D network, advancing blind acoustic parameter estimation from Ambisonics data.
Findings
Over 50% reduction in estimation errors for T60, DRR, and C50.
SSCV feature outperforms spectral-only features.
FOA-Conv3D achieves higher variance explained than CNN and CRNN.
Abstract
Estimating frequency-varying acoustic parameters is essential for enhancing immersive perception in realistic spatial audio creation. In this paper, we propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR), and clarity (C50) across 10 frequency bands using first-order Ambisonics (FOA) speech recordings as inputs. The proposed framework utilizes a novel feature named Spectro-Spatial Covariance Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal. Our models significantly outperform existing single-channel methods with only spectral information, reducing estimation errors by more than half for all three acoustic parameters. Additionally, we introduce FOA-Conv3D, a novel back-end network for effectively utilising the SSCV feature with a 3D convolutional encoder. FOA-Conv3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlind Source Separation Techniques · Speech and Audio Processing · Image and Signal Denoising Methods
