TL;DR
MultiSiam introduces a self-supervised learning approach tailored for multi-instance street scene data, significantly enhancing autonomous driving model generalization and outperforming existing SSL methods on key benchmarks.
Contribution
The paper proposes a novel multi-instance similarity measurement and intra-image clustering techniques for self-supervised learning in complex street scenes, addressing limitations of single-object assumptions.
Findings
Achieves state-of-the-art transfer performance on Cityscapes and BDD100K.
Outperforms existing SSL methods like MoCo, MoCo-v2, and BYOL.
Pre-training on SODA10M surpasses ImageNet pre-trained models.
Abstract
Autonomous driving has attracted much attention over the years but turns out to be harder than expected, probably due to the difficulty of labeled data collection for model training. Self-supervised learning (SSL), which leverages unlabeled data only for representation learning, might be a promising way to improve model performance. Existing SSL methods, however, usually rely on the single-centric-object guarantee, which may not be applicable for multi-instance datasets such as street scenes. To alleviate this limitation, we raise two issues to solve: (1) how to define positive samples for cross-view consistency and (2) how to measure similarity in multi-instance circumstances. We first adopt an IoU threshold during random cropping to transfer global-inconsistency to local-consistency. Then, we propose two feature alignment methods to enable 2D feature maps for multi-instance similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization · Bootstrap Your Own Latent · InfoNCE · Momentum Contrast · Siamese Network
