Generalized Category Discovery under the Long-Tailed Distribution
Bingchen Zhao, Kai Han

TL;DR
This paper introduces a novel framework for generalized category discovery in long-tailed datasets, addressing challenges of class imbalance and unknown category numbers, and demonstrating effectiveness through experiments.
Contribution
It proposes a new method combining confident sample selection and density-based clustering tailored for long-tailed GCD scenarios, which was previously unexplored.
Findings
Effective discovery of novel categories in long-tailed data
Improved balance between classifier learning and category estimation
Demonstrated superior performance on benchmark datasets
Abstract
This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution, which involves discovering novel categories in an unlabelled dataset using knowledge from a set of labelled categories. Existing works assume a uniform distribution for both datasets, but real-world data often exhibits a long-tailed distribution, where a few categories contain most examples, while others have only a few. While the long-tailed distribution is well-studied in supervised and semi-supervised settings, it remains unexplored in the GCD context. We identify two challenges in this setting - balancing classifier learning and estimating category numbers - and propose a framework based on confident sample selection and density-based clustering to tackle them. Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Domain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques
