Audio Visual Speaker Localization from EgoCentric Views
Jinzheng Zhao, Yong Xu, Xinyuan Qian, Wenwu Wang

TL;DR
This paper introduces a transformer-based egocentric audio-visual speaker localization method that addresses challenges like occlusion and speaker disappearance, demonstrating promising results on a new dataset and real-world scenarios.
Contribution
It presents a novel transformer-based fusion approach for ego-centric speaker localization and a new dataset simulating out-of-view scenarios, advancing real-world applicability.
Findings
Achieves state-of-the-art results in active speaker detection
Effective in multi-speaker scenarios with real-world data
Shows robustness to occlusions and speaker disappearance
Abstract
The use of audio and visual modality for speaker localization has been well studied in the literature by exploiting their complementary characteristics. However, most previous works employ the setting of static sensors mounted at fixed positions. Unlike them, in this work, we explore the ego-centric setting, where the heterogeneous sensors are embodied and could be moving with a human to facilitate speaker localization. Compared to the static scenario, the ego-centric setting is more realistic for smart-home applications e.g., a service robot. However, this also brings new challenges such as blurred images, frequent speaker disappearance from the field of view of the wearer, and occlusions. In this paper, we study egocentric audio-visual speaker DOA estimation and deal with the challenges mentioned above. Specifically, we propose a transformer-based audio-visual fusion method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies
