Unsupervised Contrastive Learning Using Out-Of-Distribution Data for Long-Tailed Dataset

Cuong Manh Hoang; Yeejin Lee; Byeongkeun Kang

arXiv:2506.12698·cs.CV·June 17, 2025

Unsupervised Contrastive Learning Using Out-Of-Distribution Data for Long-Tailed Dataset

Cuong Manh Hoang, Yeejin Lee, Byeongkeun Kang

PDF

Open Access

TL;DR

This paper proposes a novel self-supervised learning approach that leverages out-of-distribution data to improve representation quality on long-tailed datasets, enhancing class balance and separability for image classification.

Contribution

It introduces a method combining OOD data with contrastive learning and knowledge distillation to address class imbalance in SSL tasks.

Findings

01

Outperforms previous state-of-the-art methods on four long-tailed datasets.

02

Effectively learns balanced and well-separated embeddings.

03

Utilizes OOD data to guide contrastive learning and improve representation quality.

Abstract

This work addresses the task of self-supervised learning (SSL) on a long-tailed dataset that aims to learn balanced and well-separated representations for downstream tasks such as image classification. This task is crucial because the real world contains numerous object categories, and their distributions are inherently imbalanced. Towards robust SSL on a class-imbalanced dataset, we investigate leveraging a network trained using unlabeled out-of-distribution (OOD) data that are prevalently available online. We first train a network using both in-domain (ID) and sampled OOD data by back-propagating the proposed pseudo semantic discrimination loss alongside a domain discrimination loss. The OOD data sampling and loss functions are designed to learn a balanced and well-separated embedding space. Subsequently, we further optimize the network on ID data by unsupervised contrastive learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning