CYBORGS: Contrastively Bootstrapping Object Representations by Grounding in Segmentation
Renhao Wang, Hang Zhao, Yang Gao

TL;DR
This paper introduces CYBORGS, an end-to-end framework that jointly learns object representations and segmentation masks through contrastive learning, improving pretraining on complex scenes for better transfer to downstream tasks.
Contribution
It presents a novel joint learning approach that iteratively improves segmentation masks and object representations using contrastive loss grounded in segmentation.
Findings
Robust transfer of learned representations to downstream tasks.
Improved segmentation quality during pretraining.
Enhanced performance in classification, detection, and segmentation.
Abstract
Many recent approaches in contrastive learning have worked to close the gap between pretraining on iconic images like ImageNet and pretraining on complex scenes like COCO. This gap exists largely because commonly used random crop augmentations obtain semantically inconsistent content in crowded scene images of diverse objects. Previous works use preprocessing pipelines to localize salient objects for improved cropping, but an end-to-end solution is still elusive. In this work, we propose a framework which accomplishes this goal via joint learning of representations and segmentation. We leverage segmentation masks to train a model with a mask-dependent contrastive loss, and use the partially trained model to bootstrap better masks. By iterating between these two components, we ground the contrastive updates in segmentation information, and simultaneously improve segmentation throughout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning
