Document Clustering using K-Means and K-Medoids

Rakesh Chandra Balabantaray; Chandrali Sarma; Monica Jha

arXiv:1502.07938·cs.IR·March 2, 2015·33 cites

Document Clustering using K-Means and K-Medoids

Rakesh Chandra Balabantaray, Chandrali Sarma, Monica Jha

PDF

Open Access

TL;DR

This paper compares K-Means and K-Medoids clustering algorithms for document clustering, aiming to improve information retrieval efficiency through summarization of key points in relevant documents.

Contribution

It evaluates and compares the effectiveness of K-Means and K-Medoids algorithms for document clustering and applies summarization to enhance information accessibility.

Findings

01

K-Medoids outperforms K-Means in cluster quality.

02

Summarization on best clusters highlights key document points.

03

Clustering improves quick information retrieval.

Abstract

With the huge upsurge of information in day-to-days life, it has become difficult to assemble relevant information in nick of time. But people, always are in dearth of time, they need everything quick. Hence clustering was introduced to gather the relevant information in a cluster. There are several algorithms for clustering information out of which in this paper, we accomplish K-means and K-Medoids clustering algorithm and a comparison is carried out to find which algorithm is best for clustering. On the best clusters formed, document summarization is executed based on sentence weight to focus on key point of the whole document, which makes it easier for people to ascertain the information they want and thus read only those documents which is relevant in their point of view.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Data Management and Algorithms · Data Mining Algorithms and Applications