FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving

Chengen Xie; Chonghao Sima; Tianyu Li; Bin Sun; Junjie Wu; Zhihui Hao; Hongyang Li

arXiv:2601.05611·cs.CV·March 10, 2026

FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving

Chengen Xie, Chonghao Sima, Tianyu Li, Bin Sun, Junjie Wu, Zhihui Hao, Hongyang Li

PDF

Open Access

TL;DR

FLARE introduces a self-supervised framework that leverages pre-trained vision-language models for autonomous driving by predicting future scene dynamics in latent space, eliminating the need for language annotations and improving decision-making.

Contribution

It proposes a novel self-supervised learning method activating VLMs for driving, bypassing language supervision and enhancing scene understanding and control policies.

Findings

01

Achieves state-of-the-art results on NAVSIM benchmark.

02

Effectively predicts future scene dynamics in latent space.

03

Improves autonomous driving performance without language annotations.

Abstract

While Vision-Language Models (VLMs) offer rich world knowledge for end-to-end autonomous driving, current approaches heavily rely on labor-intensive language annotations (e.g., VQA) to bridge perception and control. This paradigm suffers from a fundamental mismatch between discrete linguistic tokens and continuous driving trajectories, often leading to suboptimal control policies and inefficient utilization of pre-trained knowledge. To address these challenges, we propose FLARE (Future-aware LAtent REpresentation), a novel framework that activates the visual-semantic capabilities of pre-trained VLMs without requiring language supervision. Instead of aligning with text, we introduce a self-supervised future feature prediction objective. This mechanism compels the model to anticipate scene dynamics and ego-motion directly in the latent space, enabling the learning of robust driving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications