ReZero: Region-customizable Sound Extraction
Rongzhi Gu, Yi Luo

TL;DR
ReZero introduces a flexible framework for multi-channel sound extraction within user-defined spatial regions, enabling targeted sound separation beyond traditional fixed-region methods, with demonstrated effectiveness on simulated and real data.
Contribution
The paper presents a novel, general framework for region-customizable sound extraction, including new definitions, feature extraction methods, and a multi-channel RNN model tailored for this task.
Findings
Effective sound extraction within various spatial regions.
Works on both simulated and real-recorded data.
Flexible for different microphone geometries.
Abstract
We introduce region-customizable sound extraction (ReZero), a general and flexible framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at extracting all active target sounds (e.g., human speech) within a specific, user-defined spatial region, which is different from conventional and existing tasks where a blind separation or a fixed, predefined spatial region are typically assumed. The spatial region can be defined as an angular window, a sphere, a cone, or other geometric patterns. Being a solution to the R-SE task, the proposed ReZero framework includes (1) definitions of different types of spatial regions, (2) methods for region feature extraction and aggregation, and (3) a multi-channel extension of the band-split RNN (BSRNN) model specified for the R-SE task. We design experiments for different microphone array geometries, different types of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
MethodsResidual Connection · ReZero
