Anticipation-VLA: Solving Long-Horizon Embodied Tasks via Anticipation-based Subgoal Generation

Zhilong Zhang; Wenyu Luo; Haonan Wang; Yifei Sheng; Yidi Wang; Hanyuan Guo; Haoxiang Ren; Xinghao Du; Yuhan Che; Tongtong Cao; Lei Yuan; Yang Yu

arXiv:2605.01772·cs.RO·May 5, 2026

Anticipation-VLA: Solving Long-Horizon Embodied Tasks via Anticipation-based Subgoal Generation

Zhilong Zhang, Wenyu Luo, Haonan Wang, Yifei Sheng, Yidi Wang, Hanyuan Guo, Haoxiang Ren, Xinghao Du, Yuhan Che, Tongtong Cao, Lei Yuan, Yang Yu

PDF

TL;DR

Anticipation-VLA introduces an adaptive hierarchical approach with a recursive anticipation model for improved long-horizon embodied task execution using vision-language-action models.

Contribution

It presents a novel anticipation model that adaptively generates subgoals, enhancing robustness in long-horizon tasks compared to fixed-granularity methods.

Findings

01

Outperforms existing models in simulated robotic tasks

02

Demonstrates effectiveness in real-world robotic experiments

03

Highlights the importance of adaptive subgoal generation

Abstract

Vision-Language-Action (VLA) models have emerged as a powerful paradigm for embodied intelligence, enabling robots to perform tasks based on natural language instructions and current visual input. However, existing VLA models struggle with long-horizon tasks due to compounding errors. Prior methods decompose tasks into subtasks of fixed granularity, which cannot adapt to the varying complexity of execution states, limiting their robustness in long-horizon tasks. To overcome this, we introduce Anticipation Model, which adaptively and recursively generates future subgoals. This model continuously adapts as the task unfolds, adjusting future subgoals in response to evolving dynamics, facilitating more reliable planning paths. Building on this concept, we propose Anticipation-VLA, a hierarchical VLA model that leverages the anticipation model to generate actionable subgoals that guide VLA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.