Document Clustering with K-tree

Christopher M. De Vries; Shlomo Geva

arXiv:1001.0827·cs.IR·January 7, 2010

Document Clustering with K-tree

Christopher M. De Vries, Shlomo Geva

PDF

TL;DR

This paper introduces the K-tree clustering algorithm adapted for document clustering in large-scale information retrieval, demonstrating its efficiency and quality improvements over existing methods.

Contribution

The paper presents a novel adaptation of the K-tree algorithm for document clustering, emphasizing its scalability and effectiveness in large datasets.

Findings

01

K-tree scales efficiently with large datasets

02

K-tree provides promising clustering quality

03

Support Vector Machines used for document classification

Abstract

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.