Realistic Traffic Generation for Web Robots
Kyle Brown, Derek Doran

TL;DR
This paper presents a new method for generating realistic web robot traffic that accurately mimics real-world behavior, aiding system testing and evaluation.
Contribution
It introduces a novel traffic generator based on statistical and Bayesian models that captures the temporal and behavioral traits of web robots.
Findings
Generated traffic closely matches real robot traffic in session metrics
The traffic impacts cache performance similarly to real traffic
Models are fitted to logs from North America and Europe
Abstract
Critical to evaluating the capacity, scalability, and availability of web systems are realistic web traffic generators. Web traffic generation is a classic research problem, no generator accounts for the characteristics of web robots or crawlers that are now the dominant source of traffic to a web server. Administrators are thus unable to test, stress, and evaluate how their systems perform in the face of ever increasing levels of web robot traffic. To resolve this problem, this paper introduces a novel approach to generate synthetic web robot traffic with high fidelity. It generates traffic that accounts for both the temporal and behavioral qualities of robot traffic by statistical and Bayesian models that are fitted to the properties of robot traffic seen in web logs from North America and Europe. We evaluate our traffic generator by comparing the characteristics of generated traffic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
