A comparison of two suffix tree-based document clustering algorithms
Muhammad Rafi, M. Maujood, M. M. Fazal, S. M. Ali

TL;DR
This paper compares two suffix tree-based document clustering algorithms, focusing on their computational efficiency and clustering quality, to evaluate their effectiveness for managing large document collections.
Contribution
It introduces a comparative analysis of two novel suffix tree-based clustering methods, highlighting their differences in phrase extraction, representation, and similarity measures.
Findings
Both algorithms effectively cluster documents using suffix trees.
The efficiency of the algorithms varies based on phrase extraction methods.
Clustering quality depends on the similarity measures used.
Abstract
Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional vector based document similarity for clustering to suffix tree based document similarity, as it offers more semantic representation of the text present in the document. In this paper, we compare and contrast two recently introduced approaches to document clustering based on suffix tree data model. The first is an Efficient Phrase based document clustering, which extracts phrases from documents to form compact document representation and uses a similarity measure based on common suffix tree to cluster the documents. The second approach is a frequent word/word meaning sequence based document clustering, it similarly extracts the common word sequence from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Web Data Mining and Analysis · Data Mining Algorithms and Applications
