Blind Estimation of Sub-band Acoustic Parameters from Ambisonics   Recordings using Spectro-Spatial Covariance Features

Hanyu Meng; Jeroen Breebaart; Jeremy Stoddard; Vidhyasaharan Sethu,; Eliathamby Ambikairajah

arXiv:2411.03172·eess.AS·January 14, 2025

Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features

Hanyu Meng, Jeroen Breebaart, Jeremy Stoddard, Vidhyasaharan Sethu,, Eliathamby Ambikairajah

PDF

Open Access

TL;DR

This paper presents a novel spectro-spatial covariance feature and a deep learning framework for blind estimation of acoustic parameters from Ambisonics recordings, significantly improving accuracy over existing methods.

Contribution

It introduces the Spectro-Spatial Covariance Vector (SSCV) feature and FOA-Conv3D network, advancing blind acoustic parameter estimation from Ambisonics data.

Findings

01

Over 50% reduction in estimation errors for T60, DRR, and C50.

02

SSCV feature outperforms spectral-only features.

03

FOA-Conv3D achieves higher variance explained than CNN and CRNN.

Abstract

Estimating frequency-varying acoustic parameters is essential for enhancing immersive perception in realistic spatial audio creation. In this paper, we propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR), and clarity (C50) across 10 frequency bands using first-order Ambisonics (FOA) speech recordings as inputs. The proposed framework utilizes a novel feature named Spectro-Spatial Covariance Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal. Our models significantly outperform existing single-channel methods with only spectral information, reducing estimation errors by more than half for all three acoustic parameters. Additionally, we introduce FOA-Conv3D, a novel back-end network for effectively utilising the SSCV feature with a 3D convolutional encoder. FOA-Conv3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques · Speech and Audio Processing · Image and Signal Denoising Methods