Generalized Category Discovery in Event-Centric Contexts: Latent Pattern Mining with LLMs
Yi Luo, Qiwen Wang, Junqi Yang, Luyao Tang, Zhenghao Lin, Zhenzhe Ying, Weiqiang Wang, Chen Lin

TL;DR
This paper introduces PaMA, a novel framework leveraging large language models to improve generalized category discovery in complex, event-centric narratives with imbalanced classes, demonstrating significant performance gains.
Contribution
The paper presents PaMA, a new LLM-based method for event-centric GCD that addresses subjective clustering and class imbalance, validated on new and existing benchmarks.
Findings
PaMA outperforms prior methods with up to 12.58% H-score improvements.
It maintains strong generalization on standard GCD datasets.
Effective handling of complex narratives and class imbalance.
Abstract
Generalized Category Discovery (GCD) aims to classify both known and novel categories using partially labeled data that contains only known classes. Despite achieving strong performance on existing benchmarks, current textual GCD methods lack sufficient validation in realistic settings. We introduce Event-Centric GCD (EC-GCD), characterized by long, complex narratives and highly imbalanced class distributions, posing two main challenges: (1) divergent clustering versus classification groupings caused by subjective criteria, and (2) Unfair alignment for minority classes. To tackle these, we propose PaMA, a framework leveraging LLMs to extract and refine event patterns for improved cluster-class alignment. Additionally, a ranking-filtering-mining pipeline ensures balanced representation of prototypes across imbalanced categories. Evaluations on two EC-GCD benchmarks, including a newly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Imbalanced Data Classification Techniques
MethodsBalanced Selection
