See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection

Amir Mallak; Erfan Aasi; Shiva Sreeram; Tsun-Hsuan Wang; Daniela Rus; Alaa Maalouf

arXiv:2601.10707·cs.CV·January 16, 2026

See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection

Amir Mallak, Erfan Aasi, Shiva Sreeram, Tsun-Hsuan Wang, Daniela Rus, Alaa Maalouf

PDF

Open Access

TL;DR

This paper introduces Stochastic-Patch-Selection (SPS), a method that improves the robustness and generalization of autonomous driving policies by randomly masking patch features during training, leading to better OOD performance and real-world transfer.

Contribution

The paper proposes SPS, a novel stochastic patch masking technique that enhances policy robustness and generalization in autonomous driving by reducing redundancy and overfitting in foundation model features.

Findings

01

SPS outperforms state-of-the-art methods in OOD scenarios with a 6.2% average improvement.

02

SPS achieves up to 20.4% better performance in closed-loop simulations.

03

The learned policy successfully transfers to real-world driving without tuning.

Abstract

Recent advances in end-to-end autonomous driving show that policies trained on patch-aligned features extracted from foundation models generalize better to Out-of-Distribution (OOD). We hypothesize that due to the self-attention mechanism, each patch feature implicitly embeds/contains information from all other patches, represented in a different way and intensity, making these descriptors highly redundant. We quantify redundancy in such (BLIP2) features via PCA and cross-patch similarity: $90$ % of variance is captured by $17/64$ principal components, and strong inter-token correlations are pervasive. Training on such overlapping information leads the policy to overfit spurious correlations, hurting OOD robustness. We present Stochastic-Patch-Selection (SPS), a simple yet effective approach for learning policies that are more robust, generalizable, and efficient. For every frame, SPS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis