Application of Fuzzy Clustering for Text Data Dimensionality Reduction

Amir Karami

arXiv:1909.10881·cs.CL·September 25, 2019

Application of Fuzzy Clustering for Text Data Dimensionality Reduction

Amir Karami

PDF

Open Access

TL;DR

This paper investigates using fuzzy clustering as a novel unsupervised feature transformation method for reducing the dimensionality of text data, outperforming traditional techniques like PCA and SVD.

Contribution

It introduces fuzzy clustering as a new UFT-based approach for text data dimensionality reduction, demonstrating its effectiveness over existing methods.

Findings

01

Fuzzy clustering exceeds PCA and SVD in performance.

02

Global term weighting enhances fuzzy clustering results.

03

Different fuzzifier values impact clustering effectiveness.

Abstract

Large textual corpora are often represented by the document-term frequency matrix whose elements are the frequency of terms; however, this matrix has two problems: sparsity and high dimensionality. Four dimension reduction strategies are used to address these problems. Of the four strategies, unsupervised feature transformation (UFT) is a popular and efficient strategy to map the terms to a new basis in the document-term frequency matrix. Although several UFT-based methods have been developed, fuzzy clustering has not been considered for dimensionality reduction. This research explores fuzzy clustering as a new UFT-based approach to create a lower-dimensional representation of documents. Performance of fuzzy clustering with and without using global term weighting methods is shown to exceed principal component analysis and singular value decomposition. This study also explores the effect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Advanced Clustering Algorithms Research