Preprocessing: A Prerequisite for Discovering Patterns in WUM Process
C. Ramya, K S Shreedhara, G Kavitha

TL;DR
This paper presents a comprehensive preprocessing methodology for web log data, significantly reducing data size and enhancing structure to facilitate more effective web usage pattern discovery.
Contribution
It introduces a complete preprocessing framework including merging, cleaning, session identification, and formatting, validated by experiments showing substantial data size reduction and improved data quality.
Findings
Reduces web log data size to 73-82% of original
Produces richer, structured logs for analysis
Enhances pattern discovery in web usage mining
Abstract
Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering redundant and irrelevant data, removing noise, transforming and resolving any inconsistencies. In this paper, a complete preprocessing methodology having merging, data cleaning, user/session identification and data formatting and summarization activities to improve the quality of data by reducing the quantity of data has been proposed. To validate the efficiency of the proposed preprocessing methodology, several experiments are conducted and the results show that the proposed methodology reduces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Rough Sets and Fuzzy Logic
