Towards Open-Ended Emotional Support Conversations in LLMs via Reinforcement Learning with Future-Oriented Rewards
Ting Yang, Li Chen, Huimin Wang

TL;DR
This paper presents RLFF-ESC, a reinforcement learning framework that enables large language models to generate emotionally supportive responses by simulating future dialogue trajectories and explicitly reasoning during response generation.
Contribution
It introduces a novel end-to-end reinforcement learning approach with future-oriented rewards and explicit reasoning for open-ended emotional support conversations in LLMs.
Findings
RLFF-ESC outperforms baselines in goal completion
Improves response quality and relevance
Effective across multiple datasets
Abstract
Emotional Support Conversation (ESC) systems aim to alleviate users' emotional difficulties and provide long-term, systematic support for emotional well-being. However, most large language model (LLM)-based ESC systems rely on predefined strategies, which limits their effectiveness in complex, real-life scenarios. To enable flexible responses to diverse emotional problem scenarios, this paper introduces a novel end-to-end framework (RLFF-ESC) that directly learns enduring emotionally supportive response skills using reinforcement learning. For sustained emotional support, we first employ an LLM-based multi-agent mechanism to simulate future dialogue trajectories and collect future-oriented rewards. We then train a future-oriented reward model, which is subsequently used to train the emotional support policy model. Additionally, we incorporate an explicit reasoning process during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions
