Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters
Jiatong Li, Wiebke Middelberg, Simon Doclo

TL;DR
This paper introduces a geometry-conditioned non-linear filter for target speaker extraction that adapts to different microphone array geometries, improving robustness and spatial selectivity.
Contribution
It proposes a novel FiLM-based conditioning method and a joint DOA-microphone position feature to enhance generalization across array geometries.
Findings
GC-SSF outperforms previous methods on mismatched array geometries.
The proposed approach maintains high spatial selectivity.
Experimental results validate improved robustness across various array types.
Abstract
Recently, a spatially selective non-linear filter (SSF) has been proposed for target speaker extraction, using the target direction-of-arrival (DOA) as a spatial cue. Since learned intermediate features are tied to the microphone geometry, the performance of the SSF degrades significantly when evaluated on mismatched array geometries. In this paper, we propose a geometry-conditioned SSF (GC-SSF), which incorporates a geometry-conditioning branch based on FiLM layers. Furthermore, we propose a feature that jointly encodes the DOA and the microphone positions (DOA-MPE). The conditioning branch modulates the intermediate feature maps of the SSF using the DOA-MPE feature to capture the spatial relationship between the microphone positions and the target speaker. Experimental results across circular, uniform linear, and random microphone arrays show that the proposed GC-SSF generalizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
