UCTopic: Unsupervised Contrastive Learning for Phrase Representations and Topic Mining
Jiacheng Li, Jingbo Shang, Julian McAuley

TL;DR
UCTopic introduces an unsupervised contrastive learning framework that enhances phrase representations for more effective topic mining, outperforming existing models by 38.2% NMI on clustering tasks.
Contribution
The paper presents UCTopic, a novel unsupervised contrastive learning method with cluster-assisted negatives for improved phrase and topic representations.
Findings
Outperforms state-of-the-art by 38.2% NMI in clustering
Effective in extracting coherent and diverse topical phrases
Reduces noisy negatives with cluster-assisted contrastive learning
Abstract
High-quality phrase representations are essential to finding topics and related terms in documents (a.k.a. topic mining). Existing phrase representation learning methods either simply combine unigram representations in a context-free manner or rely on extensive annotations to learn context-aware knowledge. In this paper, we propose UCTopic, a novel unsupervised contrastive learning framework for context-aware phrase representations and topic mining. UCTopic is pretrained in a large scale to distinguish if the contexts of two phrase mentions have the same semantics. The key to pretraining is positive pair construction from our phrase-oriented assumptions. However, we find traditional in-batch negatives cause performance decay when finetuning on a dataset with small topic numbers. Hence, we propose cluster-assisted contrastive learning(CCL) which largely reduces noisy negatives by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsContrastive Learning
