Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation
Hui Ma, Bo Zhang, Bo Xu, Jian Wang, Hongfei Lin, and Xiao Sun

TL;DR
This paper introduces EmpRL, a reinforcement learning framework that improves empathetic response generation in dialogue systems by aligning empathy levels using a specialized reward function and fine-tuning a pre-trained language model.
Contribution
It proposes a novel empathy reward function with three mechanisms and fine-tunes a T5 model using reinforcement learning to generate more empathetic responses.
Findings
Significantly improves response empathy levels.
Enhances similarity between generated and target responses.
Produces responses covering affective and cognitive empathy.
Abstract
Empathetic response generation, aiming to understand the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Traditional approaches typically employ maximum likelihood estimation as the optimization objective during training, yet fail to align the empathy levels between generated and target responses. To this end, we propose an empathetic response generation framework using reinforcement learning (EmpRL). The framework develops an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. EmpRL utilizes the pre-trained T5 model as the generator and further fine-tunes it to initialize the policy. To align the empathy levels between generated and target responses within a given context, an empathy reward function containing three empathy communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Behavioral and Psychological Studies
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Softmax · Dense Connections · Inverse Square Root Schedule · Dropout · Linear Layer · Attention Dropout
