Unsupervised Learning for Class Distribution Mismatch

Pan Du; Wangbo Zhao; Xinai Lu; Nian Liu; Zhikai Li; Chaoyu Gong; Suyun Zhao; Hong Chen; Cuiping Li; Kai Wang; Yang You

arXiv:2505.06948·cs.CV·May 13, 2025

Unsupervised Learning for Class Distribution Mismatch

Pan Du, Wangbo Zhao, Xinai Lu, Nian Liu, Zhikai Li, Chaoyu Gong, Suyun Zhao, Hong Chen, Cuiping Li, Kai Wang, Yang You

PDF

Open Access 1 Repo

TL;DR

This paper introduces UCDM, an unsupervised method that effectively handles class distribution mismatch by synthesizing training pairs and iteratively pseudo-labeling unlabeled data, outperforming semi-supervised methods.

Contribution

The paper presents a novel unsupervised approach for class distribution mismatch that does not rely on labeled data, using data synthesis and confidence-based pseudo-labeling.

Findings

01

UCDM outperforms previous semi-supervised methods on multiple datasets.

02

Achieves significant accuracy improvements with 60% class mismatch.

03

Effectively classifies known, unknown, and new classes without labeled data.

Abstract

Class distribution mismatch (CDM) refers to the discrepancy between class distributions in training data and target tasks. Previous methods address this by designing classifiers to categorize classes known during training, while grouping unknown or new classes into an "other" category. However, they focus on semi-supervised scenarios and heavily rely on labeled data, limiting their applicability and performance. To address this, we propose Unsupervised Learning for Class Distribution Mismatch (UCDM), which constructs positive-negative pairs from unlabeled data for classifier training. Our approach randomly samples images and uses a diffusion model to add or erase semantic classes, synthesizing diverse training pairs. Additionally, we introduce a confidence-based labeling mechanism that iteratively assigns pseudo-labels to valuable real-world data and incorporates them into the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruc-dwbi-ml/research
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques · Face recognition and analysis

MethodsDiffusion · Focus