HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Minghui Lin; Pengxiang Ding; Shu Wang; Zifeng Zhuang; Yang Liu; Xinyang Tong; Wenxuan Song; Shangke Lyu; Siteng Huang; Donglin Wang

arXiv:2512.09928·cs.RO·April 10, 2026

HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models

Minghui Lin, Pengxiang Ding, Shu Wang, Zifeng Zhuang, Yang Liu, Xinyang Tong, Wenxuan Song, Shangke Lyu, Siteng Huang, Donglin Wang

PDF

1 Repo 5 Models 1 Datasets

TL;DR

HiF-VLA introduces a motion-centric world model for vision-language-action tasks, enabling robots to reason about past and future dynamics for improved long-horizon manipulation.

Contribution

It presents a unified framework leveraging motion for bidirectional temporal reasoning, enhancing long-horizon robotic manipulation performance.

Findings

01

Surpasses strong baselines on LIBERO-Long and CALVIN ABC-D benchmarks.

02

Achieves real-world improvements in long-horizon manipulation tasks.

03

Incur negligible additional inference latency.

Abstract

Vision-Language-Action (VLA) models have recently enabled robotic manipulation by grounding visual and linguistic cues into actions. However, most VLAs assume the Markov property, relying only on the current observation and thus suffering from temporal myopia that degrades long-horizon coherence. In this work, we view motion as a more compact and informative representation of temporal context and world dynamics, capturing inter-state changes while filtering static pixel-level noise. From this perspective, HiF-VLA equips a motion-centric world model for the VLA, enabling agents to reason about temporal dynamics for future evolution during action generation. Building on this idea, we propose HiF-VLA (Hindsight, Insight, and Foresight for VLAs), a unified framework that leverages motion for bidirectional temporal reasoning. HiF-VLA encodes past dynamics through hindsight priors,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

openhelix-team/HiF-VLA
github

Models

Datasets

minnielin/libero_trajid_rlds
dataset· 245 dl
245 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.