EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs
Wenjie Tian, Zhixian Zhao, Jingbin Hu, Huakang Chen, Haohe Liu, Binshen Mu, Lei Xie

TL;DR
EmoOmni is a novel framework that enhances multimodal large language models' ability to understand and express emotions accurately in complex real-world dialogues by introducing emotional reasoning and explicit emotional instructions.
Contribution
The paper introduces EmoOmni, a unified multimodal emotional dialogue framework with the emotional Chain-of-Thought and EmoOmniPipe data pipeline, along with a new benchmark for systematic evaluation.
Findings
EmoOmni-7B performs comparably to larger models like Qwen3Omni-30B-A3B-Thinking.
The emotional Chain-of-Thought improves emotional reasoning in multimodal dialogue.
The benchmark EmoOmniEval enables systematic assessment of emotional understanding and expression.
Abstract
The evolution of Omni-Modal Large Language Models~(Omni-LLMs) has revolutionized human--computer interaction, enabling unified audio-visual perception and speech response. However, existing Omni-LLMs struggle with complex real-world scenarios, often leading to superficial understanding and contextually mismatched emotional responses. This issue is further intensified by Omni-LLM's Thinker-Talker architectures, which are implicitly connected through hidden states, leading to the loss of emotional details. In this work, we present EmoOmni, a unified framework for accurate understanding and expression in multimodal emotional dialogue. At its core, we introduce the emotional Chain-of-Thought~(E-CoT), which enforces a reasoning from fine-grained multimodal perception to textual response. Moreover, we explicitly treat E-CoT as high-level emotional instructions that guide the talker, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Emotion and Mood Recognition · Speech and dialogue systems
