EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs

Wenjie Tian; Zhixian Zhao; Jingbin Hu; Huakang Chen; Haohe Liu; Binshen Mu; Lei Xie

arXiv:2602.21900·cs.SD·March 10, 2026

EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs

Wenjie Tian, Zhixian Zhao, Jingbin Hu, Huakang Chen, Haohe Liu, Binshen Mu, Lei Xie

PDF

Open Access

TL;DR

EmoOmni is a novel framework that enhances multimodal large language models' ability to understand and express emotions accurately in complex real-world dialogues by introducing emotional reasoning and explicit emotional instructions.

Contribution

The paper introduces EmoOmni, a unified multimodal emotional dialogue framework with the emotional Chain-of-Thought and EmoOmniPipe data pipeline, along with a new benchmark for systematic evaluation.

Findings

01

EmoOmni-7B performs comparably to larger models like Qwen3Omni-30B-A3B-Thinking.

02

The emotional Chain-of-Thought improves emotional reasoning in multimodal dialogue.

03

The benchmark EmoOmniEval enables systematic assessment of emotional understanding and expression.

Abstract

The evolution of Omni-Modal Large Language Models~(Omni-LLMs) has revolutionized human--computer interaction, enabling unified audio-visual perception and speech response. However, existing Omni-LLMs struggle with complex real-world scenarios, often leading to superficial understanding and contextually mismatched emotional responses. This issue is further intensified by Omni-LLM's Thinker-Talker architectures, which are implicitly connected through hidden states, leading to the loss of emotional details. In this work, we present EmoOmni, a unified framework for accurate understanding and expression in multimodal emotional dialogue. At its core, we introduce the emotional Chain-of-Thought~(E-CoT), which enforces a reasoning from fine-grained multimodal perception to textual response. Moreover, we explicitly treat E-CoT as high-level emotional instructions that guide the talker, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Emotion and Mood Recognition · Speech and dialogue systems