Clustering and Latent Semantic Indexing Aspects of the Singular Value Decomposition
Andri Mirzal

TL;DR
This paper explores how singular value decomposition (SVD) enhances clustering and latent semantic indexing (LSI) in information retrieval, demonstrating that SVD-based methods improve document relevance filtering and can be approximated by a practical, convergent algorithm.
Contribution
It reveals the shared origin of SVD's clustering and LSI aspects and introduces a new, practical LSI algorithm that mimics SVD's clustering capabilities without needing rank determination.
Findings
SVD-based clustering improves document retrieval relevance.
The proposed LSI algorithm achieves comparable performance to SVD.
The algorithm is practical, convergent, and does not require rank tuning.
Abstract
This paper discusses clustering and latent semantic indexing (LSI) aspects of the singular value decomposition (SVD). The purpose of this paper is twofold. The first is to give an explanation on how and why the singular vectors can be used in clustering. And the second is to show that the two seemingly unrelated SVD aspects actually originate from the same source: related vertices tend to be more clustered in the graph representation of lower rank approximate matrix using the SVD than in the original semantic graph. Accordingly, the SVD can improve retrieval performance of an information retrieval system since queries made to the approximate matrix can retrieve more relevant documents and filter out more irrelevant documents than the same queries made to the original matrix. By utilizing this fact, we will devise an LSI algorithm that mimicks SVD capability in clustering related…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
