Proximity-Based Multi-Turn Optimization: Practical Credit Assignment for LLM Agent Training
Yangyi Fang, Jiaye Lin, Xiaoliang Fu, Cong Qin, Haolin Shi, Chang Liu, Peilin Zhao

TL;DR
ProxMO is a practical framework for multi-turn LLM agent training that improves credit assignment by dynamically adapting to task difficulty and semantic context, leading to better performance with minimal computational overhead.
Contribution
ProxMO introduces success-rate-aware modulation and proximity-based soft aggregation, enhancing credit assignment in real-world multi-turn LLM training scenarios.
Findings
Significant performance improvements on ALFWorld and WebShop benchmarks.
Mechanisms are lightweight and compatible with existing frameworks.
Ablation studies confirm the effectiveness of each component.
Abstract
Multi-turn LLM agents are becoming pivotal to production systems, spanning customer service automation, e-commerce assistance, and interactive task management, where accurately distinguishing high-value informative signals from stochastic noise is critical for sample-efficient training. In real-world scenarios, a failure in a trivial task may reflect random instability, whereas success in a high-difficulty task signifies a genuine capability breakthrough. Yet, existing group-based policy optimization methods rigidly rely on statistical deviation within discrete batches, frequently misallocating credit when task difficulty fluctuates. To address this issue, we propose Proximity-based Multi-turn Optimization (ProxMO), a practical and robust framework engineered specifically for the constraints of real-world deployment. ProxMO integrates global context via two lightweight mechanisms:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Advanced Bandit Algorithms Research
