Incremental Graph Construction Enables Robust Spectral Clustering of Texts
Marko Pranji\'c, Boshko Koloski, Nada Lavra\v{c}, Senja Pollak, Marko Robnik-\v{S}ikonja

TL;DR
This paper presents an incremental $k$-NN graph construction method that guarantees connectivity, improving spectral clustering robustness on text embeddings, especially at low sparsity levels.
Contribution
The authors introduce a simple incremental $k$-NN graph construction technique that ensures connectivity by design, enhancing spectral clustering stability on text datasets.
Findings
Outperforms standard $k$-NN graphs at low $k$ values.
Maintains comparable performance to standard $k$-NN at higher $k$.
Ensures connected graphs for any $k$ through incremental construction.
Abstract
Neighborhood graphs are a critical but often fragile step in spectral clustering of text embeddings. On realistic text datasets, standard -NN graphs can contain many disconnected components at practical sparsity levels (small ), making spectral clustering degenerate and sensitive to hyperparameters. We introduce a simple incremental -NN graph construction that preserves connectivity by design: each new node is linked to its nearest previously inserted nodes, which guarantees a connected graph for any . We provide an inductive proof of connectedness and discuss implications for incremental updates when new documents arrive. We validate the approach on spectral clustering of SentenceTransformer embeddings using Laplacian eigenmaps across six clustering datasets from the Massive Text Embedding Benchmark. Compared to standard -NN graphs, our method outperforms in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Complex Network Analysis Techniques · Text and Document Classification Technologies
