On the Effectiveness of Out-of-Distribution Data in Self-Supervised   Long-Tail Learning

Jianhong Bai; Zuozhu Liu; Hualiang Wang; Jin Hao; Yang Feng; Huanpeng; Chu; Haoji Hu

arXiv:2306.04934·cs.CV·July 13, 2023·6 cites

On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning

Jianhong Bai, Zuozhu Liu, Hualiang Wang, Jin Hao, Yang Feng, Huanpeng, Chu, Haoji Hu

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces COLT, a novel SSL method that leverages out-of-distribution data to improve long-tail learning, effectively re-balancing feature space and outperforming methods relying on expensive in-domain data.

Contribution

The paper proposes a new SSL approach that uses OOD data with a dynamic sampling strategy and a contrastive loss to enhance long-tail learning without requiring large-scale in-domain data.

Findings

01

Significant performance improvements on long-tailed datasets.

02

Outperforms methods using external in-domain data.

03

Effective OOD sampling strategy enhances SSL robustness.

Abstract

Though Self-supervised learning (SSL) has been widely studied as a promising technique for representation learning, it doesn't generalize well on long-tailed datasets due to the majority classes dominating the feature space. Recent work shows that the long-tailed learning performance could be boosted by sampling extra in-domain (ID) data for self-supervised training, however, large-scale ID data which can rebalance the minority classes are expensive to collect. In this paper, we propose an alternative but easy-to-use and effective solution, Contrastive with Out-of-distribution (OOD) data for Long-Tail learning (COLT), which can effectively exploit OOD data to dynamically re-balance the feature space. We empirically identify the counter-intuitive usefulness of OOD samples in SSL long-tailed learning and principally design a novel SSL method. Concretely, we first localize the `head' and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning.· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies