Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization

Jingyi Xu; Xingyu Ren; Zhoupeng Shou; Yumeng Zhang; Zhiqiang You

arXiv:2602.15854·cs.CL·February 23, 2026

Decoupling Strategy and Execution in Task-Focused Dialogue via Goal-Oriented Preference Optimization

Jingyi Xu, Xingyu Ren, Zhoupeng Shou, Yumeng Zhang, Zhiqiang You

PDF

Open Access

TL;DR

This paper introduces GOPO, a hierarchical reinforcement learning framework that decouples strategy planning from response generation in task-oriented dialogue systems, leading to significant improvements in long-horizon task success and dialogue quality.

Contribution

The paper proposes GOPO, a novel hierarchical RL approach that separates goal optimization from response generation, improving long-term task success in dialogue systems.

Findings

01

GOPO improves TSE by 7.7% and 10.3% over PPO and Memento on Mgshop.

02

A 14B model trained with GOPO outperforms Qwen-235B and GPT-5.2 in TSE.

03

Ablation studies highlight the importance of the Expert Agent in long-horizon optimization.

Abstract

Large language models show potential in task-oriented dialogue systems, yet existing training methods often rely on token-level likelihood or preference optimization, which poorly align with long-horizon task success. To address this, we propose Goal-Oriented Preference Optimization (GOPO), a hierarchical reinforcement learning framework that decouples strategy planning from response generation via an Expert Agent and a Customer Service Agent. The Expert Agent optimizes multi-turn goal preferences at the dialogue-trajectory level, while the Customer Service Agent generates responses strictly aligned with the selected strategy. We evaluate GOPO on public benchmarks and e-commerce customer service datasets, and introduce Task-focused Sequential Engagement (TSE), a sequence-level metric derived from real e-commerce interaction data. On the Mgshop dataset, GOPO improves TSE by 7.7% and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications