GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Quang Duc Nguyen; Tung Nguyen; Duc Anh Nguyen; Linh Ngo Van; Sang; Dinh; Thien Huu Nguyen

arXiv:2412.00525·cs.CL·January 24, 2025

GloCOM: A Short Text Neural Topic Model via Global Clustering Context

Quang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang, Dinh, Thien Huu Nguyen

PDF

Open Access 1 Video

TL;DR

GloCOM introduces a novel neural topic model for short texts that leverages global clustering contexts derived from pre-trained embeddings to improve topic discovery and document representation quality.

Contribution

The paper presents GloCOM, a new model that constructs global clustering contexts to enhance short text topic modeling, addressing data and label sparsity issues.

Findings

01

Outperforms state-of-the-art models in topic quality

02

Improves document representation accuracy

03

Effectively handles data and label sparsity

Abstract

Uncovering hidden topics from short texts is challenging for traditional and neural models due to data sparsity, which limits word co-occurrence patterns, and label sparsity, stemming from incomplete reconstruction targets. Although data aggregation offers a potential solution, existing neural topic models often overlook it due to time complexity, poor aggregation quality, and difficulty in inferring topic proportions for individual documents. In this paper, we propose a novel model, GloCOM (Global Clustering COntexts for Topic Models), which addresses these challenges by constructing aggregated global clustering contexts for short documents, leveraging text embeddings from pre-trained language models. GloCOM can infer both global topic distributions for clustering contexts and local distributions for individual short texts. Additionally, the model incorporates these global contexts to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GloCOM: A Short Text Neural Topic Model via Global Clustering Context· underline

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques