Data-driven 3D Room Geometry Inference with a Linear Loudspeaker Array   and a Single Microphone

Cagdas Tuna; Altan Akat; H. Nazim Bicer; Andreas Walther; Emanu\"el A.; P. Habets

arXiv:2308.14611·eess.AS·August 29, 2023

Data-driven 3D Room Geometry Inference with a Linear Loudspeaker Array and a Single Microphone

Cagdas Tuna, Altan Akat, H. Nazim Bicer, Andreas Walther, Emanu\"el A., P. Habets

PDF

Open Access

TL;DR

This paper introduces a data-driven method for inferring 3D room geometry using a linear loudspeaker array and a single microphone, leveraging neural networks and acoustic beamforming to accurately localize reflectors.

Contribution

It proposes a novel supervised deep learning approach that generalizes well to real RIRs, eliminating the need for semi-supervised intermediate steps in room geometry inference.

Findings

01

Achieves accuracy comparable to baseline model-driven methods

02

Generalizes effectively to unseen RIRs

03

Provides a fully automated RGI framework

Abstract

Knowing the room geometry may be very beneficial for many audio applications, including sound reproduction, acoustic scene analysis, and sound source localization. Room geometry inference (RGI) deals with the problem of reflector localization (RL) based on a set of room impulse responses (RIRs). Motivated by the increasing popularity of commercially available soundbars, this article presents a data-driven 3D RGI method using RIRs measured from a linear loudspeaker array to a single microphone. A convolutional recurrent neural network (CRNN) is trained using simulated RIRs in a supervised fashion for RL. The Radon transform, which is equivalent to delay-and-sum beamforming, is applied to multi-channel RIRs, and the resulting time-domain acoustic beamforming map is fed into the CRNN. The room geometry is inferred from the microphone position and the reflector locations estimated by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Music and Audio Processing