Vec2GC -- A Graph Based Clustering Method for Text Representations
Rajesh N Rao, Manojit Chakraborty

TL;DR
Vec2GC is a new unsupervised clustering method that uses graph community detection on text representations to effectively cluster terms or documents, supporting hierarchical clustering.
Contribution
The paper introduces Vec2GC, a novel end-to-end clustering algorithm leveraging graph community detection on learned text representations, applicable to various text corpora.
Findings
Effective clustering of terms and documents demonstrated.
Supports hierarchical clustering.
Applicable to unlabeled text data.
Abstract
NLP pipelines with limited or no labeled data, rely on unsupervised methods for document processing. Unsupervised approaches typically depend on clustering of terms or documents. In this paper, we introduce a novel clustering algorithm, Vec2GC (Vector to Graph Communities), an end-to-end pipeline to cluster terms or documents for any given text corpus. Our method uses community detection on a weighted graph of the terms or documents, created using text representation learning. Vec2GC clustering algorithm is a density based approach, that supports hierarchical clustering as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
