Autoregressive Guidance of Deep Spatially Selective Filters using Bayesian Tracking for Efficient Extraction of Moving Speakers
Jakob Kienegger, Timo Gerkmann

TL;DR
This paper introduces a Bayesian tracking method that autoregressively guides deep spatial filters for moving speaker enhancement, improving accuracy and robustness in dynamic scenarios with minimal additional computational cost.
Contribution
It proposes a novel Bayesian tracking approach compatible with deep spatial filters and demonstrates its effectiveness with a new social force model-based dataset.
Findings
Autoregressive guidance improves tracking accuracy.
Enhanced speech quality with negligible computational overhead.
Method generalizes well to real-world, challenging acoustic environments.
Abstract
Deep spatially selective filters achieve high-quality enhancement with real-time capable architectures for stationary speakers of known directions. To retain this level of performance in dynamic scenarios when only the speakers' initial directions are given, accurate, yet computationally lightweight tracking algorithms become necessary. Assuming a frame-wise causal processing style, temporal feedback allows for leveraging the enhanced speech signal to improve tracking performance. In this work, we investigate strategies to incorporate the enhanced signal into lightweight tracking algorithms and autoregressively guide deep spatial filters. Our proposed Bayesian tracking algorithms are compatible with arbitrary deep spatial filters. To increase the realism of simulated trajectories during development and evaluation, we propose and publish a novel dataset based on the social force model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis
