FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs

Yun Hong; Yan Zhou; Yang Feng

arXiv:2604.18159·cs.CL·April 21, 2026

FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs

Yun Hong, Yan Zhou, Yang Feng

PDF

TL;DR

FreezeEmpath introduces an efficient training method for empathetic spoken chatbots that leverages frozen large language models and existing speech data, improving emotional expressiveness and task performance.

Contribution

It presents a novel end-to-end training approach that avoids catastrophic forgetting by using only speech instruction and emotion recognition data with frozen LLMs.

Findings

01

Outperforms existing empathetic models in dialogue and SER tasks

02

Generates emotionally expressive speech effectively

03

Requires only speech instruction and SER data for training

Abstract

Empathy is essential for fostering natural interactions in spoken dialogue systems, as it enables machines to recognize the emotional tone of human speech and deliver empathetic responses. Recent research has made significant progress in developing empathetic spoken chatbots based on large language models (LLMs). However, several challenges still exist when training such models, including reliance on costly empathetic speech instruction data and a lack of emotional expressiveness in the generated speech. Finetuning LLM with cross-modal empathetic instruction data may also lead to catastrophic forgetting and a degradation of its general capability. To address these challenges, we propose FreezeEmpath, an end-to-end empathetic spoken chatbot trained in a simple and efficient manner. The entire training process relies solely on existing speech instruction data and speech emotion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.