On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning
Jianhong Bai, Zuozhu Liu, Hualiang Wang, Jin Hao, Yang Feng, Huanpeng, Chu, Haoji Hu

TL;DR
This paper introduces COLT, a novel SSL method that leverages out-of-distribution data to improve long-tail learning, effectively re-balancing feature space and outperforming methods relying on expensive in-domain data.
Contribution
The paper proposes a new SSL approach that uses OOD data with a dynamic sampling strategy and a contrastive loss to enhance long-tail learning without requiring large-scale in-domain data.
Findings
Significant performance improvements on long-tailed datasets.
Outperforms methods using external in-domain data.
Effective OOD sampling strategy enhances SSL robustness.
Abstract
Though Self-supervised learning (SSL) has been widely studied as a promising technique for representation learning, it doesn't generalize well on long-tailed datasets due to the majority classes dominating the feature space. Recent work shows that the long-tailed learning performance could be boosted by sampling extra in-domain (ID) data for self-supervised training, however, large-scale ID data which can rebalance the minority classes are expensive to collect. In this paper, we propose an alternative but easy-to-use and effective solution, Contrastive with Out-of-distribution (OOD) data for Long-Tail learning (COLT), which can effectively exploit OOD data to dynamically re-balance the feature space. We empirically identify the counter-intuitive usefulness of OOD samples in SSL long-tailed learning and principally design a novel SSL method. Concretely, we first localize the `head' and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies
