A Novel Approach for Web Page Set Mining
R.B. Geeta, Omkar Mamillapalli, Shasikumar G.Totad, Prasad Reddy, P.V.G.D

TL;DR
This paper introduces a hash index table structure for efficient web page set mining from server log files, enabling faster processing, incremental updates, and better performance compared to traditional flat file algorithms.
Contribution
It presents a novel hash index table method for web page set mining that improves efficiency and supports incremental updates without reaccessing the original database.
Findings
Performance is comparable or better than flat file algorithms.
Supports incremental updates without reprocessing the entire database.
Effective for both sparse and dense data distributions.
Abstract
The one of the most time consuming steps for association rule mining is the computation of the frequency of the occurrences of itemsets in the database. The hash table index approach converts a transaction database to an hash index tree by scanning the transaction database only once. Whenever user requests for any Uniform Resource Locator (URL), the request entry is stored in the Log File of the server. This paper presents the hash index table structure, a general and dense structure which provides web page set extraction from Log File of server. This hash table provides information about the original database. Web Page set mining (WPs-Mine) provides a complete representation of the original database. This approach works well for both sparse and dense data distributions. Web page set mining supported by hash table index shows the performance always comparable with and often better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Web Data Mining and Analysis · Rough Sets and Fuzzy Logic
