Convert Language Model into a Value-based Strategic Planner

Xiaoyu Wang; Yue Zhao; Qingqing Gu; Zhonglin Jiang; Xiaokai Chen; Yong Chen; Luo Ji

arXiv:2505.06987·cs.CL·August 28, 2025

Convert Language Model into a Value-based Strategic Planner

Xiaoyu Wang, Yue Zhao, Qingqing Gu, Zhonglin Jiang, Xiaokai Chen, Yong Chen, Luo Ji

PDF

Open Access

TL;DR

This paper introduces straQ*, a framework that transforms language models into strategic planners using Q-learning, improving long-term emotional support in conversations by optimizing responses based on future rewards.

Contribution

The paper presents a novel plug-and-play framework that enables LLMs to plan strategically for emotional support conversations using reinforcement learning.

Findings

01

straQ* outperforms baseline methods in ESC tasks

02

Q-learning enhances LLMs' ability to plan long-term strategies

03

Framework demonstrates significant improvements on ESC datasets

Abstract

Emotional support conversation (ESC) aims to alleviate the emotional distress of individuals through effective conversations. Although large language models (LLMs) have obtained remarkable progress on ESC, most of these studies might not define the diagram from the state model perspective, therefore providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Q-learning on LLMs, and propose a framework called straQ*. Our framework allows a plug-and-play LLM to bootstrap the planning during ESC, determine the optimal strategy based on long-term returns, and finally guide the LLM to response. Substantial experiments on ESC datasets suggest that straQ* outperforms many baselines, including direct inference, self-refine, chain of thought, finetuning, and finite state machines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Emotion and Mood Recognition

MethodsQ-Learning