Unsupervised Learning for Class Distribution Mismatch
Pan Du, Wangbo Zhao, Xinai Lu, Nian Liu, Zhikai Li, Chaoyu Gong, Suyun Zhao, Hong Chen, Cuiping Li, Kai Wang, Yang You

TL;DR
This paper introduces UCDM, an unsupervised method that effectively handles class distribution mismatch by synthesizing training pairs and iteratively pseudo-labeling unlabeled data, outperforming semi-supervised methods.
Contribution
The paper presents a novel unsupervised approach for class distribution mismatch that does not rely on labeled data, using data synthesis and confidence-based pseudo-labeling.
Findings
UCDM outperforms previous semi-supervised methods on multiple datasets.
Achieves significant accuracy improvements with 60% class mismatch.
Effectively classifies known, unknown, and new classes without labeled data.
Abstract
Class distribution mismatch (CDM) refers to the discrepancy between class distributions in training data and target tasks. Previous methods address this by designing classifiers to categorize classes known during training, while grouping unknown or new classes into an "other" category. However, they focus on semi-supervised scenarios and heavily rely on labeled data, limiting their applicability and performance. To address this, we propose Unsupervised Learning for Class Distribution Mismatch (UCDM), which constructs positive-negative pairs from unlabeled data for classifier training. Our approach randomly samples images and uses a diffusion model to add or erase semantic classes, synthesizing diverse training pairs. Additionally, we introduce a confidence-based labeling mechanism that iteratively assigns pseudo-labels to valuable real-world data and incorporates them into the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques · Face recognition and analysis
MethodsDiffusion · Focus
