An effective web document clustering for information retrieval
R.K. Roul, S.K. Sahay

TL;DR
This paper presents a combined web document clustering method using Frequent Pattern growth and Fuzzy C-Means to improve clustering accuracy and efficiency for information retrieval from large web datasets.
Contribution
It introduces a novel hybrid approach that enhances traditional clustering by integrating frequent pattern mining with fuzzy clustering, addressing initial centroid sensitivity.
Findings
Outperforms traditional clustering methods in efficiency
Handles initial centroid sensitivity better
More effective for large web datasets
Abstract
The size of web has increased exponentially over the past few years with thousands of documents related to a subject available to the user. With this much amount of information available, it is not possible to take the full advantage of the World Wide Web without having a proper framework to search through the available data. This requisite organization can be done in many ways. In this paper we introduce a combine approach to cluster the web pages which first finds the frequent sets and then clusters the documents. These frequent sets are generated by using Frequent Pattern growth technique. Then by applying Fuzzy C- Means algorithm on it, we found clusters having documents which are highly related and have similar features. We used Gensim package to implement our approach because of its simplicity and robust nature. We have compared our results with the combine approach of (Frequent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Clustering Algorithms Research · Text and Document Classification Technologies
