Web Document Clustering and Ranking using Tf-Idf based Apriori Approach
R.K. Roul, O. R. Devanand, S.K. Sahay

TL;DR
This paper introduces a novel Tf-Idf based Apriori approach for clustering and ranking web documents, improving relevance and retrieval efficiency for large unstructured datasets.
Contribution
The paper proposes a new clustering and ranking method combining Tf-Idf with Apriori, tailored for web documents, and demonstrates its effectiveness on large datasets.
Findings
Better clustering results at higher minimum support
Achieved a F-measure of 78% in ranking accuracy
Outperforms traditional Apriori algorithm
Abstract
The dynamic web has increased exponentially over the past few years with more than thousands of documents related to a subject available to the user now. Most of the web documents are unstructured and not in an organized manner and hence user facing more difficult to find relevant documents. A more useful and efficient mechanism is combining clustering with ranking, where clustering can group the similar documents in one place and ranking can be applied to each cluster for viewing the top documents at the beginning.. Besides the particular clustering algorithm, the different term weighting functions applied to the selected features to represent web document is a main aspect in clustering task. Keeping this approach in mind, here we proposed a new mechanism called Tf-Idf based Apriori for clustering the web documents. We then rank the documents in each cluster using Tf-Idf and similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Advanced Clustering Algorithms Research
