FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs
Yun Hong, Yan Zhou, Yang Feng

TL;DR
FreezeEmpath introduces an efficient training method for empathetic spoken chatbots that leverages frozen large language models and existing speech data, improving emotional expressiveness and task performance.
Contribution
It presents a novel end-to-end training approach that avoids catastrophic forgetting by using only speech instruction and emotion recognition data with frozen LLMs.
Findings
Outperforms existing empathetic models in dialogue and SER tasks
Generates emotionally expressive speech effectively
Requires only speech instruction and SER data for training
Abstract
Empathy is essential for fostering natural interactions in spoken dialogue systems, as it enables machines to recognize the emotional tone of human speech and deliver empathetic responses. Recent research has made significant progress in developing empathetic spoken chatbots based on large language models (LLMs). However, several challenges still exist when training such models, including reliance on costly empathetic speech instruction data and a lack of emotional expressiveness in the generated speech. Finetuning LLM with cross-modal empathetic instruction data may also lead to catastrophic forgetting and a degradation of its general capability. To address these challenges, we propose FreezeEmpath, an end-to-end empathetic spoken chatbot trained in a simple and efficient manner. The entire training process relies solely on existing speech instruction data and speech emotion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
