Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Hao Jiang, Calvin Murdock, Vamsi Krishna Ithapu

TL;DR
This paper introduces a novel deep learning method for egocentric audio-visual active speaker localization that accurately detects and localizes speakers in complex, noisy environments using video and multi-channel audio.
Contribution
It presents an end-to-end deep learning approach capable of localizing speakers from all directions, including outside the camera view, and detecting the wearer's own voice activity, outperforming previous methods.
Findings
Superior localization accuracy in challenging conditions
Real-time processing capability
Robustness against noise and visual clutter
Abstract
Augmented reality devices have the potential to enhance human perception and enable other assistive functionalities in complex conversational environments. Effectively capturing the audio-visual context necessary for understanding these social interactions first requires detecting and localizing the voice activities of the device wearer and the surrounding people. These tasks are challenging due to their egocentric nature: the wearer's head motion may cause motion blur, surrounding people may appear in difficult viewing angles, and there may be occlusions, visual clutter, audio noise, and bad lighting. Under these conditions, previous state-of-the-art active speaker detection methods do not give satisfactory results. Instead, we tackle the problem from a new setting using both video and multi-channel microphone array audio. We propose a novel end-to-end deep learning approach that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Indoor and Outdoor Localization Technologies · Advanced Adaptive Filtering Techniques
