UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Zhaowen Li; Yousong Zhu; Fan Yang; Wei Li; Chaoyang Zhao; Yingying; Chen; Zhiyang Chen; Jiahao Xie; Liwei Wu; Rui Zhao; Ming Tang; Jinqiao Wang

arXiv:2203.06965·cs.CV·March 15, 2022

UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying, Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang

PDF

Open Access

TL;DR

UniVIP is a versatile self-supervised learning framework that effectively captures scene and instance relationships, achieving state-of-the-art results across multiple visual tasks on diverse datasets.

Contribution

The paper introduces UniVIP, a unified SSL framework that models scene and instance correlations at three levels, improving transfer learning and detection performance.

Findings

01

Achieves state-of-the-art transfer performance on COCO and ImageNet.

02

Outperforms BYOL by 2.5% in linear probing.

03

Surpasses existing self-supervised object detection methods.

Abstract

Self-supervised learning (SSL) holds promise in leveraging large amounts of unlabeled data. However, the success of popular SSL methods has limited on single-centric-object images like those in ImageNet and ignores the correlation among the scene and instances, as well as the semantic difference of instances in the scene. To address the above problems, we propose a Unified Self-supervised Visual Pre-training (UniVIP), a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. The framework takes into account the representation learning at three levels: 1) the similarity of scene-scene, 2) the correlation of scene-instance, 3) the discrimination of instance-instance. During the learning, we adopt the optimal transport algorithm to automatically measure the discrimination of instances. Massive experiments show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsBootstrap Your Own Latent