Listening to the Echo: User-Reaction Aware Policy Optimization via Scalar-Verbal Hybrid Reinforcement Learning
Jing Ye, Xinpei Zhao, Lu Xiang, Yaping Zhang, Chengqing Zong

TL;DR
This paper introduces RAPO, a reinforcement learning framework for dialogue systems that leverages user reactions and natural language feedback to improve emotional support interactions, addressing the limitations of scalar reward signals.
Contribution
RAPO is a novel framework that uses user reactions and verbal feedback to optimize dialogue policies, enhancing emotional support effectiveness beyond traditional scalar rewards.
Findings
RAPO outperforms baseline RL methods in emotional support tasks.
Utilizes simulated user responses for dense feedback generation.
Improves positive emotional shifts in dialogue interactions.
Abstract
While current emotional support dialogue systems typically rely on expert-defined scalar rewards for alignment, these signals suffer from severe information sparsity. They cannot explain why a response failed or how to adapt to dynamic user states, often diverging from the actual goal of facilitating positive emotional shifts. In practice, the most direct and reliable learning signal emerges from the user's continuous reactions during ongoing interaction. We therefore propose Reaction Aware Policy Optimization (RAPO), a framework that optimizes over interaction consequences rather than rubric scores. RAPO treats dialogue as a reaction-driven process and utilizes simulated user responses to generate dense natural-language feedback through three core components: Hindsight Dialogue Selection, which isolates pivotal turns that meaningfully alter user emotional trajectories; Generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Emotion and Mood Recognition
