Attention Is All You Need For Blind Room Volume Estimation

Chunxi Wang; Maoshen Jia; Meiran Li; Changchun Bao; Wenyu Jin

arXiv:2309.13504·eess.AS·December 29, 2023

Attention Is All You Need For Blind Room Volume Estimation

Chunxi Wang, Maoshen Jia, Meiran Li, Changchun Bao, Wenyu Jin

PDF

Open Access

TL;DR

This paper introduces a novel attention-based Transformer model for blind room volume estimation from noisy speech, outperforming CNN-based methods by leveraging self-attention, transfer learning, and data augmentation.

Contribution

It presents the first purely attention-based approach for blind room volume estimation, eliminating the need for CNNs and demonstrating improved accuracy in real-world conditions.

Findings

01

The Transformer model outperforms CNN-based models in accuracy.

02

Transfer learning and data augmentation enhance model performance.

03

The approach is effective across diverse acoustic environments.

Abstract

In recent years, dynamic parameterization of acoustic environments has raised increasing attention in the field of audio processing. One of the key parameters that characterize the local room acoustics in isolation from orientation and directivity of sources and receivers is the geometric room volume. Convolutional neural networks (CNNs) have been widely selected as the main models for conducting blind room acoustic parameter estimation, which aims to learn a direct mapping from audio spectrograms to corresponding labels. With the recent trend of self-attention mechanisms, this paper introduces a purely attention-based model to blindly estimate room volumes based on single-channel noisy speech signals. We demonstrate the feasibility of eliminating the reliance on CNN for this task and the proposed Transformer architecture takes Gammatone magnitude spectral coefficients and phase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Layer Normalization · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Dense Connections