Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training

Yangyi Fang; Jiaye Lin; Xiaoliang Fu; Cong Qin; Haolin Shi; Chang Liu; Peilin Zhao

arXiv:2602.19225·cs.AI·February 24, 2026

Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training

Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi, Chang Liu, Peilin Zhao

PDF

Open Access

TL;DR

ProxMO is a practical framework for multi-turn LLM agent training that improves credit assignment by dynamically adapting to task difficulty and semantic context, leading to better performance with minimal computational overhead.

Contribution

ProxMO introduces success-rate-aware modulation and proximity-based soft aggregation, enhancing credit assignment in real-world multi-turn LLM training scenarios.

Findings

01

Significant performance improvements on ALFWorld and WebShop benchmarks.

02

Mechanisms are lightweight and compatible with existing frameworks.

03

Ablation studies confirm the effectiveness of each component.

Abstract

Multi-turn LLM agents are becoming pivotal to production systems, spanning customer service automation, e-commerce assistance, and interactive task management, where accurately distinguishing high-value informative signals from stochastic noise is critical for sample-efficient training. In real-world scenarios, a failure in a trivial task may reflect random instability, whereas success in a high-difficulty task signifies a genuine capability breakthrough. Yet, existing group-based policy optimization methods rigidly rely on statistical deviation within discrete batches, frequently misallocating credit when task difficulty fluctuates. To address this issue, we propose Proximity-based Multi-turn Optimization (ProxMO), a practical and robust framework engineered specifically for the constraints of real-world deployment. ProxMO integrates global context via two lightweight mechanisms:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Advanced Bandit Algorithms Research