Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO

Kun Peng; Conghui Tan; Yu Liu; Guohua Tang; Zhongqian Sun; Wei Yang; Zining Zhu; Lei Jiang; Yanbing Liu; Hao Peng

arXiv:2602.08533·cs.AI·February 11, 2026

Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO

Kun Peng, Conghui Tan, Yu Liu, Guohua Tang, Zhongqian Sun, Wei Yang, Zining Zhu, Lei Jiang, Yanbing Liu, Hao Peng

PDF

Open Access

TL;DR

This paper introduces a novel RL framework for dialogue agents that combines online personalization with adaptive tree-based policy optimization, enabling long-term, engaging, and personalized conversations with improved efficiency.

Contribution

It proposes a new long-horizon RL method using adaptive tree-based policy optimization and a two-agent game paradigm for better dialogue personalization and efficiency.

Findings

01

Outperforms existing dialogue models in engagement and personalization.

02

Achieves higher sample efficiency and robustness in experiments.

03

Reduces computational overhead from exponential to polynomial in dialogue length.

Abstract

Open-ended dialogue agents aim to deliver engaging, personalized interactions by adapting to users' traits, but existing methods face critical limitations: over-reliance on pre-collected user data, and short-horizon biases in reinforcement learning (RL) that neglect long-term dialogue value. To address these, we propose a novel long-horizon RL framework integrating online personalization with Adaptive Tree-based Group Relative Policy Optimization (AT-GRPO). Adopting a two-agent game paradigm, a user agent constructs dynamic environments via style mimicry (learning user-specific conversational traits) and active termination (predicting turn-level termination probabilities as immediate rewards), forming an iterative cycle that drives the dialogue agent to deepen interest exploration. AT-GRPO reinterprets dialogue trajectories as trees and introduces adaptive observation ranges. Unlike…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Recommender Systems and Techniques