Disentangling Similarity and Relatedness in Topic Models

Hanlin Xiao; Mauricio A. \'Alvarez; Rainer Breitling

arXiv:2603.10619·cs.CL·March 12, 2026

Disentangling Similarity and Relatedness in Topic Models

Hanlin Xiao, Mauricio A. \'Alvarez, Rainer Breitling

PDF

Open Access

TL;DR

This paper introduces a method to distinguish between similarity and relatedness in topic models, using a synthetic benchmark and neural scoring, revealing how different models capture semantic structures and impact downstream tasks.

Contribution

It presents a novel approach to disentangle similarity and relatedness in topic models, providing a benchmark and evaluation pipeline for better understanding semantic capture.

Findings

01

Different model families capture distinct semantic structures.

02

Similarity and relatedness scores predict downstream task performance.

03

The proposed benchmark enables systematic evaluation of semantic axes.

Abstract

The recent advancement of large language models has spurred a growing trend of integrating pre-trained language model (PLM) embeddings into topic models, fundamentally reshaping how topics capture semantic structure. Classical models such as Latent Dirichlet Allocation (LDA) derive topics from word co-occurrence statistics, whereas PLM-augmented models anchor these statistics to pre-trained embedding spaces, imposing a prior that also favours clustering of semantically similar words. This structural difference can be captured by the psycholinguistic dimensions of thematic relatedness and taxonomic similarity of the topic words. To disentangle these dimensions in topic models, we construct a large synthetic benchmark of word pairs using LLM-based annotation to train a neural scoring function. We apply this scorer to a comprehensive evaluation across multiple corpora and topic model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Sentiment Analysis and Opinion Mining