Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Lingdong Kong; Xiang Xu; Jiawei Ren; Wenwei Zhang; Liang Pan; Kai Chen; Wei Tsang Ooi; Ziwei Liu

arXiv:2405.05258·cs.CV·December 8, 2025

Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving

Lingdong Kong, Xiang Xu, Jiawei Ren, Wenwei Zhang, Liang Pan, Kai Chen, Wei Tsang Ooi, Ziwei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces LaserMix++, a semi-supervised framework that leverages multi-modal data and novel augmentation techniques to improve 3D scene understanding in autonomous driving with significantly less labeled data.

Contribution

The study presents LaserMix++, a new semi-supervised learning framework that integrates multi-modal data, laser beam manipulations, and language-driven guidance for efficient 3D scene understanding.

Findings

01

Outperforms fully supervised methods with five times fewer annotations.

02

Achieves comparable accuracy to fully supervised models.

03

Significantly improves baseline semi-supervised approaches.

Abstract

Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ldkong1205/LaserMix
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · 3D Shape Modeling and Analysis · Advanced Vision and Imaging