Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios

Jakob Kienegger; Timo Gerkmann

arXiv:2601.12345·eess.AS·January 22, 2026

Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios

Jakob Kienegger, Timo Gerkmann

PDF

Open Access

TL;DR

This paper introduces a joint autoregressive framework that enhances the robustness of adaptive rotary steering in dynamic multi-speaker scenarios, effectively tracking and separating closely spaced moving speakers using temporal-spectral correlations.

Contribution

It proposes a novel joint autoregressive approach that incorporates processed recordings as guidance, improving tracking and separation of moving speakers in complex acoustic environments.

Findings

01

Significant improvement in tracking accuracy for closely spaced speakers

02

Outperforms non-autoregressive methods on synthetic datasets

03

Effective in real-world scenarios with multiple crossings

Abstract

Latest advances in deep spatial filtering for Ambisonics demonstrate strong performance in stationary multi-speaker scenarios by rotating the sound field toward a target speaker prior to multi-channel enhancement. For applicability in dynamic acoustic conditions with moving speakers, we propose to automate this rotary steering using an interleaved tracking algorithm conditioned on the target's initial direction. However, for nearby or crossing speakers, robust tracking becomes difficult and spatial cues less effective for enhancement. By incorporating the processed recording as additional guide into both algorithms, our novel joint autoregressive framework leverages temporal-spectral correlations of speech to resolve spatially challenging speaker constellations. Consequently, our proposed method significantly improves tracking and enhancement of closely spaced speakers, consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis