Clustering and Classification in Text Collections Using Graph Modularity

Grigory Pivovarov; Sergei Trunov

arXiv:1105.5789·cs.IR·May 31, 2011·2 cites

Clustering and Classification in Text Collections Using Graph Modularity

Grigory Pivovarov, Sergei Trunov

PDF

Open Access

TL;DR

The paper introduces a fast algorithm for clustering and classifying large text collections using bipartite graph modularity, achieving high quality results and record-breaking speed.

Contribution

It presents a novel algorithm that leverages bipartite graph modularity for efficient text clustering and classification, outperforming existing methods in speed.

Findings

01

Competitive clustering and classification quality.

02

Record-breaking processing speed.

03

Effective use of bipartite graph modularity.

Abstract

A new fast algorithm for clustering and classification of large collections of text documents is introduced. The new algorithm employs the bipartite graph that realizes the word-document matrix of the collection. Namely, the modularity of the bipartite graph is used as the optimization functional. Experiments performed with the new algorithm on a number of text collections had shown a competitive quality of the clustering (classification), and a record-breaking speed.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Web Data Mining and Analysis · Advanced Graph Neural Networks