Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation
Liang Zeng, Attila Lengyel, Nergis T\"omen, Jan van Gemert

TL;DR
This paper introduces a novel contrastive learning approach for urban scene segmentation that uses depth-based copy-paste augmentation to improve semantic invariance and surpasses state-of-the-art results without extensive pre-training.
Contribution
It proposes leveraging estimated depth to create coherent regions for copy-paste augmentation, enhancing contrastive learning for better urban scene segmentation.
Findings
Achieves +7.14% mIoU on Cityscapes
Achieves +6.65% mIoU on KITTI
Does not require pre-training on ImageNet or COCO
Abstract
In this work, we leverage estimated depth to boost self-supervised contrastive learning for segmentation of urban scenes, where unlabeled videos are readily available for training self-supervised depth estimation. We argue that the semantics of a coherent group of pixels in 3D space is self-contained and invariant to the contexts in which they appear. We group coherent, semantically related pixels into coherent depth regions given their estimated depth and use copy-paste to synthetically vary their contexts. In this way, cross-context correspondences are built in contrastive learning and a context-invariant representation is learned. For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7.14% in mIoU on Cityscapes and +6.65% on KITTI. For fine-tuning on Cityscapes and KITTI segmentation, our method is competitive with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Robotics and Sensor-Based Localization
Methodssimple Copy-Paste · Contrastive Learning
