Improving Contrastive Learning on Imbalanced Seed Data via Open-World Sampling
Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

TL;DR
This paper introduces MAK, a principled data sampling framework for contrastive learning that strategically selects unlabeled external data to improve representation quality and class balance, especially in open-world, imbalanced scenarios.
Contribution
We propose MAK, a novel open-world data sampling method based on three principles—tailness, proximity, and diversity—that enhances contrastive learning with external unlabeled data.
Findings
MAK improves representation quality on ImageNet-100-LT.
MAK achieves better class balancedness in learned features.
MAK enhances performance in both full-shot and few-shot settings.
Abstract
Contrastive learning approaches have achieved great success in learning visual representations with few labels of the target classes. That implies a tantalizing possibility of scaling them up beyond a curated "seed" benchmark, to incorporating more unlabeled images from the internet-scale external sources to enhance its performance. However, in practice, larger amount of unlabeled data will require more computing resources due to the bigger model size and longer training needed. Moreover, open-world unlabeled data usually follows an implicit long-tail class or attribute distribution, many of which also do not belong to the target classes. Blindly leveraging all unlabeled data hence can lead to the data imbalance as well as distraction issues. This motivates us to seek a principled approach to strategically select unlabeled data from an external source, in order to learn generalizable,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
