A Fuzzy Based Approach to Text Mining and Document Clustering
Sumit Goswami, Mayank Singh Shishodia

TL;DR
This paper presents a fuzzy logic-based method for text mining and document clustering, utilizing fuzzy c-means to classify documents into categories with degrees of membership, enhancing interpretability and accuracy.
Contribution
It introduces a novel application of fuzzy c-means clustering combined with feature analysis for document categorization, incorporating degrees of membership for improved classification insight.
Findings
Effective clustering of documents into two categories.
Documents show higher word frequency features for their respective categories.
Fuzzy logic provides degrees of membership, enriching cluster interpretability.
Abstract
Fuzzy logic deals with degrees of truth. In this paper, we have shown how to apply fuzzy logic in text mining in order to perform document clustering. We took an example of document clustering where the documents had to be clustered into two categories. The method involved cleaning up the text and stemming of words. Then, we chose m number of features which differ significantly in their word frequencies (WF), normalized by document length, between documents belonging to these two clusters. The documents to be clustered were represented as a collection of m normalized WF values. Fuzzy c-means (FCM) algorithm was used to cluster these documents into two clusters. After the FCM execution finished, the documents in the two clusters were analysed for the values of their respective m features. It was known that documents belonging to a document type, say X, tend to have higher WF values for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Advanced Text Analysis Techniques · Data Management and Algorithms
