Summaries as Centroids for Interpretable and Scalable Text Clustering

Jairo Diaz-Rodriguez

arXiv:2502.09667·cs.CL·February 10, 2026

Summaries as Centroids for Interpretable and Scalable Text Clustering

Jairo Diaz-Rodriguez

PDF

Open Access 3 Reviews

TL;DR

This paper proposes a novel text clustering method that uses human-readable summaries as centroids, combining interpretability with scalability, and demonstrates its effectiveness across various datasets and streaming scenarios.

Contribution

It introduces k-NLPmeans and k-LLMmeans, innovative clustering algorithms that replace numeric centroids with textual summaries, enhancing interpretability without sacrificing accuracy.

Findings

01

Outperforms classical clustering baselines.

02

Approaches the accuracy of LLM-based clustering.

03

Effective for streaming text data.

Abstract

We introduce k-NLPmeans and k-LLMmeans, text-clustering variants of k-means that periodically replace numeric centroids with textual summaries. The key idea, summary-as-centroid, retains k-means assignments in embedding space while producing human-readable, auditable cluster prototypes. The method is LLM-optional: k-NLPmeans uses lightweight, deterministic summarizers, enabling offline, low-cost, and stable operation; k-LLMmeans is a drop-in upgrade that uses an LLM for summaries under a fixed per-iteration budget whose cost does not grow with dataset size. We also present a mini-batch extension for real-time clustering of streaming text. Across diverse datasets, embedding models, and summarization strategies, our approach consistently outperforms classical baselines and approaches the accuracy of recent LLM-based clustering-without extensive LLM calls. Finally, we provide a case study…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

- The proposal of using summaries as centroids in k-means clustering looks like a very innovative and easy to implement approach. - The authors included diverse ways of computing the summary centroid besides simply querying an LLM such as TextRank, SVD etc which can provide computationally cheaper alternatives. - The approach shows good gains in performance over standard k-means clustering demonstrated consistently across many datasets and while using many different embedding models (Table1-4).

Weaknesses

Not much of a weakness but a suggestion : I would suggest elaborating on the NMI metric to give the readers a brief explanation of how it is calculated.

Reviewer 02Rating 2Confidence 3

Strengths

In this manuscript, the authors aim to address the problems of poor interpretability of numeric centroids and high scalability costs in traditional text clustering methods. Specifically, the proposed k-NLPmeans uses lightweight and deterministic classical NLP summarizers to periodically replace numeric centroids with textual summaries. The proposed k-LLMmeans leverages LLMs for summaries under a fixed per-iteration budget. Experimental results across diverse datasets and embedding models show th

Weaknesses

There are some concerns for the manuscript as follows: 1.How to set k in the experiments? The influence of k in the k-means to the experimental results is not discussed. 2.In the example of Figure 1, it is based on the results of k-means. However, in the proposed method, the authors proposed new summarization to compute a textual prototype in place of the standard centroid update. Thus, how the proposed method guarantee that the instances in the same cluster can be used to generate promising su

Reviewer 03Rating 6Confidence 4

Strengths

- **Simple but novel idea:** The notion of introducing interpretable textual centroids inside the k-means loop is elegant, practical, and original. It creates a direct, auditable link between cluster means and human-interpretable summaries. - **Interpretability without post-hoc processing:** Unlike topic models or LLM-based pipelines that only label clusters afterward, the prototype is the cluster, which is useful for debugging, transparency, and downstream analyst workflows. - **Low-resource ap

Weaknesses

- **Summarization hints:** Performance depends on the summarizer, especially in heterogeneous clusters. The paper tests several summarizers but does not provide guidance on when one strategy is preferable (e.g., extractive vs. abstractive by dataset characteristics). - **Missing comparison on interpretability:** Interpretability is a key selling point, but comparisons are mostly against centroid-based clustering. Topic-model-style baselines (e.g., BERTopic,) would give a fairer interpretability

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling