PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
Yunxiao Wang, Meng Liu, Kaiyu Jiang, Bin Wen, Fan Yang, Tingting Gao, Lizi Liao

TL;DR
PEER introduces a structured, psychology-informed reinforcement learning framework for emotional support conversations, improving empathy and human-likeness while addressing challenges like reward unreliability and response repetition.
Contribution
It proposes a novel three-step reasoning process, a new dataset SER with labels, and a unified reward model GRPO with UnifiReward for better evaluation and training.
Findings
Enhanced empathy and strategy alignment in responses.
Improved human-likeness without reducing diversity.
Effective reduction of repetitive outputs.
Abstract
Emotional support conversations require more than fluent responses. Supporters need to understand the seeker's situation and emotions, adopt an appropriate strategy, and respond in a natural, human-like manner. Despite advances in large language models, current systems often lack structured, psychology-informed reasoning. Additionally, it is challenging to enhance these systems through reinforcement learning because of unreliable reward signals. Moreover, reinforcement fine-tuning can amplify repetitive response patterns. We propose structured empathetic reasoning, which breaks support into three steps: conversation history analysis, multimodal emotional state inference, and strategy selection, prior to generating the final reply. To implement this, we introduce SER, a fine-grained dataset with step-level correctness labels and pairwise response preferences. We then present PEER, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
