Empathy Omni: Enabling Empathetic Speech Response Generation through Large Language Models
Haoyu Wang, Guangyan Zhang, Jiale Chen, Jingyu Li, Yuehai Wang, Yiwen Guo

TL;DR
Emotion Omni is a novel speech LLM that understands emotional cues in user speech and generates empathetic responses, achieving high speech quality and emotional expressiveness with limited data and without large-scale pretraining.
Contribution
The paper introduces Emotion Omni, a model capable of empathetic speech response generation using a new emotional dialogue dataset and a data pipeline, reducing reliance on massive datasets.
Findings
Emotion Omni achieves high speech quality (UTMOS:4.41).
It surpasses existing models in empathy (Emotion GPT Score: 3.97).
It maintains instruction-following ability without large-scale pretraining.
Abstract
With the development of speech large language models (speech LLMs), users can now interact directly with assistants via speech. However, most existing models only convert response content into speech without fully capturing the rich emotional cues in user queries, where the same sentence may convey different meanings depending on the expression. Emotional understanding is thus essential for improving human-machine interaction. Most empathetic speech LLMs rely on massive datasets, demanding high computational cost. A key challenge is to build models that generate empathetic responses with limited data and without large-scale training. To this end, we propose Emotion Omni, a model that understands emotional content in user speech and generates empathetic responses. We further developed a data pipeline to construct a 200k emotional dialogue dataset supporting empathetic speech assistants.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
