Web Usage mining framework for Data Cleaning and IP address   Identification

Priyanka Verma; Nishtha Kesswani

arXiv:1408.5460·cs.DB·August 26, 2014·5 cites

Web Usage mining framework for Data Cleaning and IP address Identification

Priyanka Verma, Nishtha Kesswani

PDF

Open Access

TL;DR

This paper presents a framework for web usage mining focusing on data cleaning and IP address identification to improve user pattern analysis from web logs.

Contribution

It proposes new methodologies for data cleaning and IP address identification in web log preprocessing, enhancing user behavior analysis accuracy.

Findings

01

Number of users identified after IP address processing.

02

Improved data quality for web usage mining.

03

Enhanced accuracy in user pattern detection.

Abstract

The World Wide Web is the most wide known information source that is easily available and searchable. It consists of billions of interconnected documents Web pages are authored by millions of people. Accesses made by various users to pages are recorded inside web logs. These log files exist in various formats. Because of increase in usage of web, size of web log files is increasing at a much faster rate. Web mining is application of data mining technique to these log files. It can be of three types Web usage mining, Web structure mining and Web content mining. Web Usage mining is mining of usage patterns of users which can then be used to personalize web sites and create attractive web sites. It consists of three main phases: Preprocessing, Pattern discovery and Pattern analysis. In this paper we focus on Data cleaning and IP Address identification stages of preprocessing. Methodology…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Recommender Systems and Techniques · Caching and Content Delivery