Learning 3D Semantic Segmentation with only 2D Image Supervision

Kyle Genova; Xiaoqi Yin; Abhijit Kundu; Caroline Pantofaru; Forrester; Cole; Avneesh Sud; Brian Brewington; Brian Shucker; Thomas Funkhouser

arXiv:2110.11325·cs.CV·October 22, 2021

Learning 3D Semantic Segmentation with only 2D Image Supervision

Kyle Genova, Xiaoqi Yin, Abhijit Kundu, Caroline Pantofaru, Forrester, Cole, Avneesh Sud, Brian Brewington, Brian Shucker, Thomas Funkhouser

PDF

TL;DR

This paper introduces a method to train 3D semantic segmentation models using only 2D image annotations by generating pseudo-labels and fusing multiple views, addressing data scarcity and transfer issues.

Contribution

It proposes 2D3DNet, a novel framework that leverages 2D image labels for 3D segmentation, including strategies for pseudo-label trustworthiness and rare object sampling.

Findings

01

Achieves +6.2-11.4 mIoU improvement over baselines

02

Effective pseudo-label selection and scene sampling methods

03

Demonstrates strong generalization across diverse urban datasets

Abstract

With the recent growth of urban mapping and autonomous driving efforts, there has been an explosion of raw 3D data collected from terrestrial platforms with lidar scanners and color cameras. However, due to high labeling costs, ground-truth 3D semantic segmentation annotations are limited in both quantity and geographic diversity, while also being difficult to transfer across sensors. In contrast, large image collections with ground-truth semantic segmentations are readily available for diverse sets of scenes. In this paper, we investigate how to use only those labeled 2D image collections to supervise training 3D semantic segmentation models. Our approach is to train a 3D model from pseudo-labels derived from 2D semantic image segmentations using multiview fusion. We address several novel issues with this approach, including how to select trusted pseudo-labels, how to sample 3D scenes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.