Generalized Category Discovery under the Long-Tailed Distribution

Bingchen Zhao; Kai Han

arXiv:2506.12515·cs.CV·June 23, 2025

Generalized Category Discovery under the Long-Tailed Distribution

Bingchen Zhao, Kai Han

PDF

Open Access

TL;DR

This paper introduces a novel framework for generalized category discovery in long-tailed datasets, addressing challenges of class imbalance and unknown category numbers, and demonstrating effectiveness through experiments.

Contribution

It proposes a new method combining confident sample selection and density-based clustering tailored for long-tailed GCD scenarios, which was previously unexplored.

Findings

01

Effective discovery of novel categories in long-tailed data

02

Improved balance between classifier learning and category estimation

03

Demonstrated superior performance on benchmark datasets

Abstract

This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution, which involves discovering novel categories in an unlabelled dataset using knowledge from a set of labelled categories. Existing works assume a uniform distribution for both datasets, but real-world data often exhibits a long-tailed distribution, where a few categories contain most examples, while others have only a few. While the long-tailed distribution is well-studied in supervised and semi-supervised settings, it remains unexplored in the GCD context. We identify two challenges in this setting - balancing classifier learning and estimating category numbers - and propose a framework based on confident sample selection and density-based clustering to tackle them. Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Domain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques