Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters

Jiatong Li; Wiebke Middelberg; Simon Doclo

arXiv:2605.18442·eess.AS·May 19, 2026

Flexible Multi-Channel Target Speaker Extraction Using Geometry-Conditioned Spatially Selective Non-linear Filters

Jiatong Li, Wiebke Middelberg, Simon Doclo

PDF

TL;DR

This paper introduces a geometry-conditioned non-linear filter for target speaker extraction that adapts to different microphone array geometries, improving robustness and spatial selectivity.

Contribution

It proposes a novel FiLM-based conditioning method and a joint DOA-microphone position feature to enhance generalization across array geometries.

Findings

01

GC-SSF outperforms previous methods on mismatched array geometries.

02

The proposed approach maintains high spatial selectivity.

03

Experimental results validate improved robustness across various array types.

Abstract

Recently, a spatially selective non-linear filter (SSF) has been proposed for target speaker extraction, using the target direction-of-arrival (DOA) as a spatial cue. Since learned intermediate features are tied to the microphone geometry, the performance of the SSF degrades significantly when evaluated on mismatched array geometries. In this paper, we propose a geometry-conditioned SSF (GC-SSF), which incorporates a geometry-conditioning branch based on FiLM layers. Furthermore, we propose a feature that jointly encodes the DOA and the microphone positions (DOA-MPE). The conditioning branch modulates the intermediate feature maps of the SSF using the DOA-MPE feature to capture the spatial relationship between the microphone positions and the target speaker. Experimental results across circular, uniform linear, and random microphone arrays show that the proposed GC-SSF generalizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.