FLARE: Learning Future-Aware Latent Representations from Vision-Language Models for Autonomous Driving
Chengen Xie, Chonghao Sima, Tianyu Li, Bin Sun, Junjie Wu, Zhihui Hao, Hongyang Li

TL;DR
FLARE introduces a self-supervised framework that leverages pre-trained vision-language models for autonomous driving by predicting future scene dynamics in latent space, eliminating the need for language annotations and improving decision-making.
Contribution
It proposes a novel self-supervised learning method activating VLMs for driving, bypassing language supervision and enhancing scene understanding and control policies.
Findings
Achieves state-of-the-art results on NAVSIM benchmark.
Effectively predicts future scene dynamics in latent space.
Improves autonomous driving performance without language annotations.
Abstract
While Vision-Language Models (VLMs) offer rich world knowledge for end-to-end autonomous driving, current approaches heavily rely on labor-intensive language annotations (e.g., VQA) to bridge perception and control. This paradigm suffers from a fundamental mismatch between discrete linguistic tokens and continuous driving trajectories, often leading to suboptimal control policies and inefficient utilization of pre-trained knowledge. To address these challenges, we propose FLARE (Future-aware LAtent REpresentation), a novel framework that activates the visual-semantic capabilities of pre-trained VLMs without requiring language supervision. Instead of aligning with text, we introduce a self-supervised future feature prediction objective. This mechanism compels the model to anticipate scene dynamics and ego-motion directly in the latent space, enabling the learning of robust driving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications
