PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
Yu Li, Guangfeng Cai, Shengtian Yang, Han Luo, Shuo Han, Xu He, Dong Li, Lei Feng

TL;DR
This paper introduces PhGPO, a novel method inspired by ant colony optimization, that leverages historical successful trajectories to improve long-horizon tool planning in LLM agents by guiding policy optimization with learned pheromone signals.
Contribution
The paper proposes a new pheromone-guided approach for long-horizon tool planning, capturing reusable transition patterns to enhance exploration and policy learning.
Findings
PhGPO significantly improves long-horizon tool planning performance.
The learned pheromone effectively guides policy optimization.
Experimental results validate the approach's effectiveness.
Abstract
Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
