IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

Shijie Lian; Bin Yu; Xiaopeng Lin; Zhaolong Shen; Laurence Tianruo Yang; Yurun Jin; Haishan Liu; Changti Wu; Hang Yuan; Cong Huang; Kai Chen

arXiv:2605.14712·cs.RO·May 15, 2026

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

Shijie Lian, Bin Yu, Xiaopeng Lin, Zhaolong Shen, Laurence Tianruo Yang, Yurun Jin, Haishan Liu, Changti Wu, Hang Yuan, Cong Huang, Kai Chen

PDF

1 Repo

TL;DR

IntentVLA is a novel framework that encodes recent observations to model short-horizon intents, improving stability and performance in aliasing-prone robot manipulation tasks.

Contribution

The paper introduces IntentVLA, a history-conditioned VLA approach, and AliasBench, a benchmark for short-horizon intent ambiguity, addressing partial observability issues.

Findings

01

IntentVLA improves rollout stability across multiple benchmarks.

02

IntentVLA outperforms existing VLA baselines.

03

AliasBench isolates short-horizon observation aliasing effectively.

Abstract

Robot imitation data are often multimodal: similar visual-language observations may be followed by different action chunks because human demonstrators act with different short-horizon intents, task phases, or recent context. Existing frame-conditioned VLA policies infer each chunk from the current observation and instruction alone, so under partial observability they may resample different intents across adjacent replanning steps, leading to inter-chunk conflict and unstable execution. We introduce IntentVLA, a history-conditioned VLA framework that encodes recent visual observations into a compact short-horizon intent representation and uses it to condition chunk generation. We further introduce AliasBench, a 12-task ambiguity-aware benchmark on RoboTwin2 with matched training data and evaluation environments that isolate short-horizon observation aliasing. Across AliasBench,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zgc-embodyai/IntentVLA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.