PRISM: LLM-Guided Semantic Clustering for High-Precision Topics

Connor Douglas; Utkucan Balci; Joseph Aylett-Bullock

arXiv:2604.03180·cs.LG·April 6, 2026

PRISM: LLM-Guided Semantic Clustering for High-Precision Topics

Connor Douglas, Utkucan Balci, Joseph Aylett-Bullock

PDF

TL;DR

PRISM is a novel framework that combines large language models with semantic clustering to improve topic discovery and separation in text corpora efficiently and interpretably.

Contribution

It introduces a student-teacher pipeline for distilling LLM supervision into lightweight models and demonstrates effective web-scale text analysis.

Findings

01

PRISM outperforms state-of-the-art local topic models in topic separability.

02

Requires only a small number of LLM queries for training.

03

Enables interpretable, locally deployable web-scale text analysis.

Abstract

In this paper, we propose Precision-Informed Semantic Modeling (PRISM), a structured topic modeling framework combining the benefits of rich representations captured by LLMs with the low cost and interpretability of latent semantic clustering methods. PRISM fine-tunes a sentence encoding model using a sparse set of LLM- provided labels on samples drawn from some corpus of interest. We segment this embedding space with thresholded clustering, yielding clusters that separate closely related topics within some narrow domain. Across multiple corpora, PRISM improves topic separability over state-of-the-art local topic models and even over clustering on large, frontier embedding models while requiring only a small number of LLM queries to train. This work contributes to several research streams by providing (i) a student-teacher pipeline to distill sparse LLM supervision into a lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.