EgoAdapt: Enhancing Robustness in Egocentric Interactive Speaker Detection Under Missing Modalities
Xinyuan Qian, Xinjia Zhu, Alessio Brutti, Dong Liang

TL;DR
EgoAdapt is a novel framework that improves egocentric speaker detection by integrating visual cues, robust audio processing, and modality awareness, effectively handling missing data and noisy environments.
Contribution
This work introduces EgoAdapt, combining head orientation, lip movement, and modality awareness to enhance robustness in egocentric speaker detection under missing modalities.
Findings
EgoAdapt achieves 67.39% mAP on the Ego4D TTM benchmark.
It outperforms previous methods by 4.96% in accuracy.
The framework effectively handles missing modalities and noisy data.
Abstract
TTM (Talking to Me) task is a pivotal component in understanding human social interactions, aiming to determine who is engaged in conversation with the camera-wearer. Traditional models often face challenges in real-world scenarios due to missing visual data, neglecting the role of head orientation, and background noise. This study addresses these limitations by introducing EgoAdapt, an adaptive framework designed for robust egocentric "Talking to Me" speaker detection under missing modalities. Specifically, EgoAdapt incorporates three key modules: (1) a Visual Speaker Target Recognition (VSTR) module that captures head orientation as a non-verbal cue and lip movement as a verbal cue, allowing a comprehensive interpretation of both verbal and non-verbal signals to address TTM, setting it apart from tasks focused solely on detecting speaking status; (2) a Parallel Shared-weight Audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Emotion and Mood Recognition · Face recognition and analysis
