ProAct: Agentic Lookahead in Interactive Environments

Yangbin Yu; Mingyu Yang; Junyou Li; Yiming Gao; Feiyu Liu; Yijun Yang; Zichuan Lin; Jiafei Lyu; Yicheng Liu; Zhicong Lu; Deheng Ye; Jie Jiang

arXiv:2602.05327·cs.AI·February 6, 2026

ProAct: Agentic Lookahead in Interactive Environments

Yangbin Yu, Mingyu Yang, Junyou Li, Yiming Gao, Feiyu Liu, Yijun Yang, Zichuan Lin, Jiafei Lyu, Yicheng Liu, Zhicong Lu, Deheng Ye, Jie Jiang

PDF

Open Access 1 Models

TL;DR

ProAct introduces a two-stage training framework for LLM agents that enhances long-horizon planning in interactive environments by internalizing lookahead reasoning and refining decision accuracy, leading to superior performance.

Contribution

ProAct's novel combination of Grounded LookAhead Distillation and Monte-Carlo Critic improves planning and decision-making in LLM agents without expensive inference-time search.

Findings

01

ProAct significantly improves planning accuracy in complex environments.

02

A 4B parameter model trained with ProAct outperforms open-source baselines.

03

ProAct demonstrates robust generalization to unseen environments.

Abstract

Existing Large Language Model (LLM) agents struggle in interactive environments requiring long-horizon planning, primarily due to compounding errors when simulating future states. To address this, we propose ProAct, a framework that enables agents to internalize accurate lookahead reasoning through a two-stage training paradigm. First, we introduce Grounded LookAhead Distillation (GLAD), where the agent undergoes supervised fine-tuning on trajectories derived from environment-based search. By compressing complex search trees into concise, causal reasoning chains, the agent learns the logic of foresight without the computational overhead of inference-time search. Second, to further refine decision accuracy, we propose the Monte-Carlo Critic (MC-Critic), a plug-and-play auxiliary value estimator designed to enhance policy-gradient algorithms like PPO and GRPO. By leveraging lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
biang889/ProAct
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)