3D Annotation-Free Learning by Distilling 2D Open-Vocabulary   Segmentation Models for Autonomous Driving

Boyi Sun; Yuhang Liu; Xingxia Wang; Bin Tian; Long Chen; Fei-Yue Wang

arXiv:2405.15286·cs.CV·January 8, 2025

3D Annotation-Free Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving

Boyi Sun, Yuhang Liu, Xingxia Wang, Bin Tian, Long Chen, Fei-Yue Wang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces AFOV, a novel annotation-free 3D segmentation framework for autonomous driving that leverages 2D open-vocabulary models and cross-modal distillation, achieving state-of-the-art results without manual annotations.

Contribution

The paper proposes a new annotation-free learning method using 2D open-vocabulary models and a novel cross-modal distillation approach for 3D point cloud segmentation.

Findings

01

Achieved 47.73% mIoU on nuScenes without annotations.

02

Surpassed previous models by 3.13% mIoU in 3D segmentation.

03

Performed well with minimal labeled data, reaching 51.75% mIoU with 1% data.

Abstract

Point cloud data labeling is considered a time-consuming and expensive task in autonomous driving, whereas annotation-free learning training can avoid it by learning point cloud representations from unannotated data. In this paper, we propose AFOV, a novel 3D \textbf{A}nnotation-\textbf{F}ree framework assisted by 2D \textbf{O}pen-\textbf{V}ocabulary segmentation models. It consists of two stages: In the first stage, we innovatively integrate high-quality textual and image features of 2D open-vocabulary models and propose the Tri-Modal contrastive Pre-training (TMP). In the second stage, spatial mapping between point clouds and images is utilized to generate pseudo-labels, enabling cross-modal knowledge distillation. Besides, we introduce the Approximate Flat Interaction (AFI) to address the noise during alignment and label confusion. To validate the superiority of AFOV, extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

3D Annotation-Free Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling