Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Jinyang Wu; Shuo Yang; Changpeng Yang; Yuhao Shen; Shuai Zhang; Zhengqi Wen; Jianhua Tao

arXiv:2601.20209·cs.LG·January 29, 2026

Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning

Jinyang Wu, Shuo Yang, Changpeng Yang, Yuhao Shen, Shuai Zhang, Zhengqi Wen, Jianhua Tao

PDF

Open Access 3 Models

TL;DR

Spark introduces a strategic, dynamic branching method for reinforcement learning that efficiently allocates resources at critical decision points, improving long-horizon agent training and generalization with fewer samples.

Contribution

The paper presents Spark, a novel framework that adaptively branches at key states for resource-efficient exploration, reducing reliance on human priors and enhancing agent performance.

Findings

01

Achieves higher success rates with fewer training samples.

02

Demonstrates robust generalization in unseen scenarios.

03

Outperforms existing methods in diverse tasks.

Abstract

Reinforcement learning has empowered large language models to act as intelligent agents, yet training them for long-horizon tasks remains challenging due to the scarcity of high-quality trajectories, especially under limited resources. Existing methods typically scale up rollout sizes and indiscriminately allocate computational resources among intermediate steps. Such attempts inherently waste substantial computation budget on trivial steps while failing to guarantee sample quality. To address this, we propose \textbf{Spark} (\textbf{S}trategic \textbf{P}olicy-\textbf{A}ware explo\textbf{R}ation via \textbf{K}ey-state dynamic branching), a novel framework that selectively branches at critical decision states for resource-efficient exploration. Our key insight is to activate adaptive branching exploration at critical decision points to probe promising trajectories, thereby achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis