Automated generation of web server fingerprints
Theodore Book, Martha Witick, Dan S. Wallach

TL;DR
This paper presents a method for automatically generating web server fingerprints using Bayesian inference, enabling accurate server identification without relying on version strings, thus improving web infrastructure analysis.
Contribution
It introduces a novel approach to web server fingerprinting that does not depend on pre-existing catalogs or version strings, using response code analysis and Bayesian inference.
Findings
Successfully identified server types from response codes
Analyzed 110,000 live web servers
Revealed key features of web infrastructure
Abstract
In this paper, we demonstrate that it is possible to automatically generate fingerprints for various web server types using multifactor Bayesian inference on randomly selected servers on the Internet, without building an a priori catalog of server features or behaviors. This makes it possible to conclusively study web server distribution without relying on reported (and variable) version strings. We gather data by sending a collection of specialized requests to 110,000 live web servers. Using only the server response codes, we then train an algorithm to successfully predict server types independently of the server version string. In the process, we note several distinguishing features of current web infrastructure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Software Testing and Debugging Techniques · Network Security and Intrusion Detection
