Faster Training, Fewer Labels: Self-Supervised Pretraining for Fine-Grained BEV Segmentation

Daniel Busch; Christian Bohn; Thomas Kurbiel; Klaus Friedrichs; Richard Meyes; Tobias Meisen

arXiv:2602.18066·cs.CV·February 23, 2026

Faster Training, Fewer Labels: Self-Supervised Pretraining for Fine-Grained BEV Segmentation

Daniel Busch, Christian Bohn, Thomas Kurbiel, Klaus Friedrichs, Richard Meyes, Tobias Meisen

PDF

Open Access

TL;DR

This paper introduces a self-supervised pretraining approach for BEV segmentation in autonomous driving, reducing annotation needs and training time while improving performance through differentiable reprojection and pseudo-labels.

Contribution

It proposes a two-phase training strategy that leverages self-supervised pretraining with pseudo-labels to enhance BEV segmentation, reducing data annotation and training time.

Findings

01

Up to +2.5pp mIoU improvement over supervised baseline

02

Halves annotation data requirement during fine-tuning

03

Reduces total training time by up to two thirds

Abstract

Dense Bird's Eye View (BEV) semantic maps are central to autonomous driving, yet current multi-camera methods depend on costly, inconsistently annotated BEV ground truth. We address this limitation with a two-phase training strategy for fine-grained road marking segmentation that removes full supervision during pretraining and halves the amount of training data during fine-tuning while still outperforming the comparable supervised baseline model. During the self-supervised pretraining, BEVFormer predictions are differentiably reprojected into the image plane and trained against multi-view semantic pseudo-labels generated by the widely used semantic segmentation model Mask2Former. A temporal loss encourages consistency across frames. The subsequent supervised fine-tuning phase requires only 50% of the dataset and significantly less training time. With our method, the fine-tuning benefits…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Visual Attention and Saliency Detection