Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance

Jakob Kienegger; Alina Mannanova; Huajian Fang; Timo Gerkmann

arXiv:2507.02791·eess.AS·July 8, 2025

Self-Steering Deep Non-Linear Spatially Selective Filters for Efficient Extraction of Moving Speakers under Weak Guidance

Jakob Kienegger, Alina Mannanova, Huajian Fang, Timo Gerkmann

PDF

TL;DR

This paper introduces a low-complexity, self-steering deep filtering approach for moving speaker extraction that combines a particle filter with temporal feedback, achieving high performance with reduced computational demands.

Contribution

It proposes a novel self-steering pipeline integrating a particle filter with temporal feedback, enabling efficient moving speaker extraction in resource-constrained scenarios.

Findings

01

Significant improvement in tracking accuracy with the combined approach

02

Enhanced speech quality demonstrated in synthetic dataset evaluations

03

Listening tests favor the proposed self-steering pipeline over existing methods

Abstract

Recent works on deep non-linear spatially selective filters demonstrate exceptional enhancement performance with computationally lightweight architectures for stationary speakers of known directions. However, to maintain this performance in dynamic scenarios, resource-intensive data-driven tracking algorithms become necessary to provide precise spatial guidance conditioned on the initial direction of a target speaker. As this additional computational overhead hinders application in resource-constrained scenarios such as real-time speech enhancement, we present a novel strategy utilizing a low-complexity tracking algorithm in the form of a particle filter instead. Assuming a causal, sequential processing style, we introduce temporal feedback to leverage the enhanced speech signal of the spatially selective filter to compensate for the limited modeling capabilities of the particle filter.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.