Preprocessing: A Prerequisite for Discovering Patterns in WUM Process

C. Ramya; K S Shreedhara; G Kavitha

arXiv:1104.2284·cs.DB·April 13, 2011

Preprocessing: A Prerequisite for Discovering Patterns in WUM Process

C. Ramya, K S Shreedhara, G Kavitha

PDF

Open Access

TL;DR

This paper presents a comprehensive preprocessing methodology for web log data, significantly reducing data size and enhancing structure to facilitate more effective web usage pattern discovery.

Contribution

It introduces a complete preprocessing framework including merging, cleaning, session identification, and formatting, validated by experiments showing substantial data size reduction and improved data quality.

Findings

01

Reduces web log data size to 73-82% of original

02

Produces richer, structured logs for analysis

03

Enhances pattern discovery in web usage mining

Abstract

Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering redundant and irrelevant data, removing noise, transforming and resolving any inconsistencies. In this paper, a complete preprocessing methodology having merging, data cleaning, user/session identification and data formatting and summarization activities to improve the quality of data by reducing the quantity of data has been proposed. To validate the efficiency of the proposed preprocessing methodology, several experiments are conducted and the results show that the proposed methodology reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Rough Sets and Fuzzy Logic