Evaluating multimodal emotion recognition in proactive conversational agents: A user study
Adnana Dragut, Raquel Lacuesta, F. Xavier Gaya-Morey, Jose M. Buades-Rubio

TL;DR
This study evaluates multimodal emotion recognition in proactive conversational agents, revealing that linguistic analysis is more reliable than facial cues for detecting user emotions, and emphasizing the importance of adaptive, context-aware interactions.
Contribution
It introduces a multimodal emotion recognition framework and provides empirical insights into its effectiveness and limitations in real user interactions with AI agents.
Findings
Facial cues often do not match users' internal emotional states.
Linguistic analysis proved more reliable than visual cues.
Adaptive conversational strategies can elicit specific emotions.
Abstract
This article presents a multimodal emotion recognition module integrated into a proactive Socially Interactive Agent (SIA) powered by generative artificial intelligence. The system evaluates real-time affective states through two distinct channels: a computer vision-based facial recognition module and a semantic linguistic analysis engine. To validate the framework, an empirical study was conducted with 20 users who engaged in dynamic, unscripted dialogues with the conversational agent. The findings reveal a significant discrepancy between automated visual cues and actual internal emotional states. When interacting with the AI, users consistently exhibited a "poker face" effect, displaying serious, concentrated facial expressions even when experiencing positive emotions. Consequently, the generative AI linguistic analysis proved significantly more reliable, by contextualizing the users'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
