ReZero: Region-customizable Sound Extraction

Rongzhi Gu; Yi Luo

arXiv:2308.16892·eess.AS·September 1, 2023

ReZero: Region-customizable Sound Extraction

Rongzhi Gu, Yi Luo

PDF

Open Access

TL;DR

ReZero introduces a flexible framework for multi-channel sound extraction within user-defined spatial regions, enabling targeted sound separation beyond traditional fixed-region methods, with demonstrated effectiveness on simulated and real data.

Contribution

The paper presents a novel, general framework for region-customizable sound extraction, including new definitions, feature extraction methods, and a multi-channel RNN model tailored for this task.

Findings

01

Effective sound extraction within various spatial regions.

02

Works on both simulated and real-recorded data.

03

Flexible for different microphone geometries.

Abstract

We introduce region-customizable sound extraction (ReZero), a general and flexible framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at extracting all active target sounds (e.g., human speech) within a specific, user-defined spatial region, which is different from conventional and existing tasks where a blind separation or a fixed, predefined spatial region are typically assumed. The spatial region can be defined as an angular window, a sphere, a cone, or other geometric patterns. Being a solution to the R-SE task, the proposed ReZero framework includes (1) definitions of different types of spatial regions, (2) methods for region feature extraction and aggregation, and (3) a multi-channel extension of the band-split RNN (BSRNN) model specified for the R-SE task. We design experiments for different microphone array geometries, different types of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques

MethodsResidual Connection · ReZero