Long-Tailed Learning for Generalized Category Discovery

Cuong Manh Hoang

arXiv:2506.06965·cs.AI·July 31, 2025

Long-Tailed Learning for Generalized Category Discovery

Cuong Manh Hoang

PDF

TL;DR

This paper introduces a novel framework for generalized category discovery in long-tailed datasets, addressing class imbalance with self-guided labeling and representation balancing, leading to improved performance over previous methods.

Contribution

The paper proposes a new framework with self-guided pseudo-labeling and representation balancing to enhance GCD in imbalanced, long-tailed datasets, outperforming existing approaches.

Findings

01

Our method surpasses previous state-of-the-art results.

02

The framework effectively handles class imbalance in real-world datasets.

03

Experimental results validate the approach's robustness and accuracy.

Abstract

Generalized Category Discovery (GCD) utilizes labeled samples of known classes to discover novel classes in unlabeled samples. Existing methods show effective performance on artificial datasets with balanced distributions. However, real-world datasets are always imbalanced, significantly affecting the effectiveness of these methods. To solve this problem, we propose a novel framework that performs generalized category discovery in long-tailed distributions. We first present a self-guided labeling technique that uses a learnable distribution to generate pseudo-labels, resulting in less biased classifiers. We then introduce a representation balancing process to derive discriminative representations. By mining sample neighborhoods, this process encourages the model to focus more on tail classes. We conduct experiments on public datasets to demonstrate the effectiveness of the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.