A Generic Method for Fine-grained Category Discovery in Natural Language Texts
Chang Tian, Matthew B. Blaschko, Wenpeng Yin, Mingzhe Xing, Yinliang, Yue, Marie-Francine Moens

TL;DR
This paper presents a novel contrastive learning method that improves fine-grained category discovery in texts by leveraging semantic similarities in a logarithmic space, with a centroid inference mechanism for real-time detection, outperforming existing methods.
Contribution
Introduces a new objective function guiding sample clustering based on semantic similarities in a logarithmic space and a centroid inference mechanism for real-time applications.
Findings
Outperforms state-of-the-art in accuracy, ARI, NMI
Validated on three benchmark tasks
Theoretically justified and empirically confirmed
Abstract
Fine-grained category discovery using only coarse-grained supervision is a cost-effective yet challenging task. Previous training methods focus on aligning query samples with positive samples and distancing them from negatives. They often neglect intra-category and inter-category semantic similarities of fine-grained categories when navigating sample distributions in the embedding space. Furthermore, some evaluation techniques that rely on pre-collected test samples are inadequate for real-time applications. To address these shortcomings, we introduce a method that successfully detects fine-grained clusters of semantically similar texts guided by a novel objective function. The method uses semantic similarities in a logarithmic space to guide sample distributions in the Euclidean space and to form distinct clusters that represent fine-grained categories. We also propose a centroid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Advanced Text Analysis Techniques
MethodsFocus · Contrastive Learning
