Automated generation of web server fingerprints

Theodore Book; Martha Witick; Dan S. Wallach

arXiv:1305.0245·cs.CR·May 2, 2013·1 cites

Automated generation of web server fingerprints

Theodore Book, Martha Witick, Dan S. Wallach

PDF

Open Access

TL;DR

This paper presents a method for automatically generating web server fingerprints using Bayesian inference, enabling accurate server identification without relying on version strings, thus improving web infrastructure analysis.

Contribution

It introduces a novel approach to web server fingerprinting that does not depend on pre-existing catalogs or version strings, using response code analysis and Bayesian inference.

Findings

01

Successfully identified server types from response codes

02

Analyzed 110,000 live web servers

03

Revealed key features of web infrastructure

Abstract

In this paper, we demonstrate that it is possible to automatically generate fingerprints for various web server types using multifactor Bayesian inference on randomly selected servers on the Internet, without building an a priori catalog of server features or behaviors. This makes it possible to conclusively study web server distribution without relying on reported (and variable) version strings. We gather data by sending a collection of specialized requests to 110,000 live web servers. Using only the server response codes, we then train an algorithm to successfully predict server types independently of the server version string. In the process, we note several distinguishing features of current web infrastructure.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Software Testing and Debugging Techniques · Network Security and Intrusion Detection