A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette

TL;DR
This paper introduces a subgoal-driven framework and MiRA reinforcement learning method to enhance long-horizon planning in LLM agents, significantly improving success rates in web navigation tasks.
Contribution
It presents a novel agent framework using subgoal decomposition for online planning and a milestone-based RL training approach, outperforming existing models.
Findings
Real-time planning improves success rate by ~10% on WebArena-Lite.
MiRA increases open model success rate from 6.4% to 43.0%.
Outperforms GPT-4-Turbo, GPT-4o, and WebRL in success rate.
Abstract
Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions, making it particularly challenging. Existing LLM-based agents struggle with long-horizon planning in two main ways. During online execution, they often lose track as new information arrives, lacking a clear and adaptive path toward the final goal. This issue is further exacerbated during reinforcement learning (RL) fine-tuning, where sparse and delayed rewards make it difficult for agents to identify which actions lead to success, preventing them from maintaining coherent reasoning over extended tasks. To address these challenges, we propose two contributions. First, we introduce an agent framework that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Machine Learning in Healthcare
