Comparison Clustering using Cosine and Fuzzy set based Similarity   Measures of Text Documents

Manan Mohan Goyal; Neha Agrawal; Manoj Kumar Sarma; Nayan Jyoti Kalita

arXiv:1505.00168·cs.IR·May 4, 2015·2 cites

Comparison Clustering using Cosine and Fuzzy set based Similarity Measures of Text Documents

Manan Mohan Goyal, Neha Agrawal, Manoj Kumar Sarma, Nayan Jyoti Kalita

PDF

Open Access

TL;DR

This paper compares the effectiveness of cosine and fuzzy set-based similarity measures in K-means clustering of text documents, aiming to identify the most accurate method for document clustering.

Contribution

It introduces a comparative analysis of cosine and fuzzy similarity measures for K-means clustering of text documents, highlighting their impact on clustering accuracy.

Findings

01

Cosine similarity yields higher clustering accuracy than fuzzy set similarity.

02

The optimal similarity measure depends on start and end time parameters.

03

Fuzzy set similarity offers a viable alternative in specific clustering scenarios.

Abstract

Keeping in consideration the high demand for clustering, this paper focuses on understanding and implementing K-means clustering using two different similarity measures. We have tried to cluster the documents using two different measures rather than clustering it with Euclidean distance. Also a comparison is drawn based on accuracy of clustering between fuzzy and cosine similarity measure. The start time and end time parameters for formation of clusters are used in deciding optimum similarity measure.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies