Inter-Instance Similarity Modeling for Contrastive Learning
Chengchao Shen, Dawei Liu, Hao Tang, Zhe Qu, Jianxin Wang

TL;DR
This paper introduces PatchMix, a novel image mixing technique for contrastive learning with Vision Transformers, which models inter-instance similarities more effectively, leading to improved performance on multiple datasets and tasks.
Contribution
PatchMix enables flexible mixing of multiple images at patch level, better capturing inter-instance similarities and reducing the gap in contrastive learning objectives.
Findings
Outperforms state-of-the-art on ImageNet-1K with 3.0% accuracy gain
Achieves 8.7% improvement in kNN accuracy on CIFAR100
Enhances transfer learning performance on COCO detection and segmentation
Abstract
The existing contrastive learning methods widely adopt one-hot instance discrimination as pretext task for self-supervised learning, which inevitably neglects rich inter-instance similarities among natural images, then leading to potential representation degeneration. In this paper, we propose a novel image mix method, PatchMix, for contrastive learning in Vision Transformer (ViT), to model inter-instance similarities among images. Following the nature of ViT, we randomly mix multiple images from mini-batch in patch level to construct mixed image patch sequences for ViT. Compared to the existing sample mix methods, our PatchMix can flexibly and efficiently mix more than two images and simulate more complicated similarity relations among natural images. In this manner, our contrastive framework can significantly reduce the gap between contrastive objective and ground truth in reality.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Remote-Sensing Image Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Residual Connection · Softmax
