LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Lingdong Kong; Xiang Xu; Youquan Liu; Jun Cen; Runnan Chen; Wenwei Zhang; Liang Pan; Kai Chen; Ziwei Liu

arXiv:2501.04005·cs.CV·December 4, 2025

LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving

Lingdong Kong, Xiang Xu, Youquan Liu, Jun Cen, Runnan Chen, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu

PDF

Open Access

TL;DR

LargeAD introduces a scalable framework that leverages vision foundation models for cross-modal 3D pretraining, significantly improving autonomous driving perception tasks through semantic alignment of 2D and 3D data.

Contribution

The paper presents a novel large-scale 3D pretraining method using VFMs for cross-modal alignment, superpixel generation, and temporal consistency in autonomous driving datasets.

Findings

01

Substantial improvements in LiDAR segmentation and detection accuracy.

02

Effective cross-modal feature alignment across diverse datasets.

03

Robustness and generalization demonstrated on 11 large-scale datasets.

Abstract

Recent advancements in vision foundation models (VFMs) have revolutionized visual perception in 2D, yet their potential for 3D scene understanding, particularly in autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. This alignment facilitates cross-modal representation learning, enhancing the semantic consistency between 2D and 3D data. We introduce several key innovations: (i) VFM-driven superpixel generation for detailed semantic representation, (ii) a VFM-assisted contrastive learning strategy to align multimodal features, (iii) superpoint temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Traffic Prediction and Management Techniques · Remote Sensing and LiDAR Applications

MethodsALIGN · Contrastive Learning