FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution

Jingjing Fan; Yushan Liu; Shoujie Li; Botao Ren; Siyuan Li; Xiao-Ping Zhang; Wenbo Ding; Zhidong Deng

arXiv:2602.15882·cs.RO·February 19, 2026

FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution

Jingjing Fan, Yushan Liu, Shoujie Li, Botao Ren, Siyuan Li, Xiao-Ping Zhang, Wenbo Ding, Zhidong Deng

PDF

Open Access

TL;DR

FUTURE-VLA introduces a real-time, unified vision-language model for long-horizon control and future forecasting in robotics, achieving state-of-the-art success rates with minimal latency by using adaptive compression and latent autoregression.

Contribution

It presents a novel architecture that reformulates long-horizon control as a sequence-generation task, enabling efficient, real-time spatiotemporal reasoning for robotics applications.

Findings

01

Achieves 99.2% success on LIBERO

02

Attains 75.4% success on RoboTwin

03

Maintains latency comparable to single-frame models

Abstract

General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robots remains constrained by the prohibitive latency of processing long-horizon histories and generating high-dimensional future predictions. To bridge this gap, we present FUTURE-VLA, a unified architecture that reformulates long-horizon control and future forecasting as a monolithic sequence-generation task. Adopting a dual-sided efficiency paradigm, FUTURE-VLA leverages a temporally adaptive compression strategy to maximize spatiotemporal information density, enabling the ingestion of extensive multi-view histories while maintaining constant inference latency. Simultaneously, it performs latent-space autoregression to align actionable dynamics with reviewable visual look-aheads in a single forward pass. These real-time predictive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis