PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning

Yu Li; Guangfeng Cai; Shengtian Yang; Han Luo; Shuo Han; Xu He; Dong Li; Lei Feng

arXiv:2602.13691·cs.AI·February 17, 2026

PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning

Yu Li, Guangfeng Cai, Shengtian Yang, Han Luo, Shuo Han, Xu He, Dong Li, Lei Feng

PDF

Open Access

TL;DR

This paper introduces PhGPO, a novel method inspired by ant colony optimization, that leverages historical successful trajectories to improve long-horizon tool planning in LLM agents by guiding policy optimization with learned pheromone signals.

Contribution

The paper proposes a new pheromone-guided approach for long-horizon tool planning, capturing reusable transition patterns to enhance exploration and policy learning.

Findings

01

PhGPO significantly improves long-horizon tool planning performance.

02

The learned pheromone effectively guides policy optimization.

03

Experimental results validate the approach's effectiveness.

Abstract

Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications