FROST-Drive: Scalable and Efficient End-to-End Driving with a Frozen Vision Encoder
Zeyu Dong, Yimin Zhu, Yu Wu, Yu Sun

TL;DR
FROST-Drive introduces a scalable end-to-end autonomous driving model that leverages a frozen pretrained vision encoder, significantly improving generalization and robustness in complex, real-world scenarios.
Contribution
The paper proposes a novel architecture that keeps the vision encoder frozen, transferring generalized knowledge from a VLM to driving tasks, challenging the need for full fine-tuning.
Findings
Outperforms fine-tuned models on Waymo E2E Dataset
Demonstrates improved robustness in long-tail scenarios
Validates the effectiveness of frozen vision encoders in autonomous driving
Abstract
End-to-end (E2E) models in autonomous driving aim to directly map sensor inputs to control commands, but their ability to generalize to novel and complex scenarios remains a key challenge. The common practice of fully fine-tuning the vision encoder on driving datasets potentially limits its generalization by causing the model to specialize too heavily in the training data. This work challenges the necessity of this training paradigm. We propose FROST-Drive, a novel E2E architecture designed to preserve and leverage the powerful generalization capabilities of a pretrained vision encoder from a Vision-Language Model (VLM). By keeping the encoder's weights frozen, our approach directly transfers the rich, generalized world knowledge from the VLM to the driving task. Our model architecture combines this frozen encoder with a transformer-based adapter for multimodal fusion and a GRU-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Autonomous Vehicle Technology and Safety
