MICA: Multi-granularity Intertemporal Credit Assignment for Long-Horizon Emotional Support Dialogue
Naifan Zhang, Ruihan Sun, Jinwei Su, Hengjie Yang, Zhengyuan Pan, Zhaohan Chen, Xiaofan Zhang

TL;DR
This paper introduces MICA, a critic-free reinforcement learning framework that improves multi-turn emotional support dialogue by effectively assigning credit across turns without complex comparisons or critics.
Contribution
MICA provides a novel multi-granularity intertemporal credit assignment method that enhances multi-turn RL for emotional support dialogue without requiring critics or rollout costs.
Findings
MICA outperforms previous methods like GRPO and REINFORCE++ on multiple benchmarks.
MICA achieves up to +43.2 improvement on EMPA.
MICA is robust and adds no additional rollout cost.
Abstract
Reinforcement learning (RL) for large language models (LLMs) has shown strong performance in single-turn tasks, but extending it to multi-turn interaction remains challenging due to sparse rewards and poor per-turn credit assignment. In emotional support dialogues, responses shape future user states, so matched-state step-wise comparison is unavailable, while trajectory-level supervision is insufficient. We propose MICA (Multi-granularity Intertemporal Credit Assignment), a critic-free RL framework for multi-turn emotional support tasks. MICA derives both immediate and delayed credit from a shared potential function over the user's structured support state. Incremental Distance Reward measures the per-turn decrease in residual distance to the target state, while its Monte Carlo return captures delayed effects. After scope-specific normalization, the two signals form a mixed advantage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
