Random Indexing K-tree

Christopher M. De Vries; Lance De Vine; Shlomo Geva

arXiv:1001.0833·cs.IR·February 2, 2010·1 cites

Random Indexing K-tree

Christopher M. De Vries, Lance De Vine, Shlomo Geva

PDF

Open Access

TL;DR

The paper introduces RI K-tree, a scalable clustering algorithm combining Random Indexing and K-tree, which improves cluster quality and handles large, dynamic, and sparse document collections effectively.

Contribution

It presents a novel combination of Random Indexing with K-tree, addressing scalability, dynamic collection management, and sparsity issues in document clustering.

Findings

01

RI K-tree improves cluster quality over original K-tree

02

The method scales well with large datasets

03

It effectively manages dynamic and sparse document collections

Abstract

Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Advanced Clustering Algorithms Research