Gaze-enhanced Crossmodal Embeddings for Emotion Recognition
Ahmed Abdou, Ekta Sood, Philipp M\"uller, Andreas Bulling

TL;DR
This paper introduces a novel crossmodal emotion recognition method that explicitly incorporates gaze information, significantly improving performance over previous models on standard datasets.
Contribution
It presents a new approach integrating gaze into crossmodal emotion embeddings, enhancing recognition accuracy and providing detailed analysis of gaze representation strategies.
Findings
Outperforms previous state-of-the-art on emotion classification tasks.
Gaze information significantly improves emotion recognition accuracy.
Effective strategies for integrating gaze into crossmodal models are identified.
Abstract
Emotional expressions are inherently multimodal -- integrating facial behavior, speech, and gaze -- but their automatic recognition is often limited to a single modality, e.g. speech during a phone call. While previous work proposed crossmodal emotion embeddings to improve monomodal recognition performance, despite its importance, an explicit representation of gaze was not included. We propose a new approach to emotion recognition that incorporates an explicit representation of gaze in a crossmodal emotion embedding framework. We show that our method outperforms the previous state of the art for both audio-only and video-only emotion classification on the popular One-Minute Gradual Emotion Recognition dataset. Furthermore, we report extensive ablation experiments and provide detailed insights into the performance of different state-of-the-art gaze representations and integration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
