Towards Reproducible Network Traffic Analysis
Jordan Holland, Paul Schmitt, Prateek Mittal, Nick Feamster

TL;DR
This paper addresses the reproducibility crisis in network traffic analysis by proposing a standardization framework and introducing pcapML, an open source system for encoding metadata into traffic captures, along with benchmarks to track progress.
Contribution
It introduces pcapML, a novel standardization system for network traffic data, and establishes benchmarks to improve reproducibility and comparability of analysis methods.
Findings
Evidence of irreproducibility due to dataset interpretation differences
pcapML enables consistent encoding of metadata in traffic captures
Benchmark platform tracks progress of analysis techniques
Abstract
Analysis techniques are critical for gaining insight into network traffic given both the higher proportion of encrypted traffic and increasing data rates. Unfortunately, the domain of network traffic analysis suffers from a lack of standardization, leading to incomparable results and barriers to reproducibility. Unlike other disciplines, no standard dataset format exists, forcing researchers and practitioners to create bespoke analysis pipelines for each individual task. Without standardization researchers cannot compare "apples-to-apples", preventing us from knowing with certainty if a new technique represents a methodological advancement or if it simply benefits from a different interpretation of a given dataset. In this work, we examine irreproducibility that arises from the lack of standardization in network traffic analysis. First, we study the literature, highlighting evidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Privacy-Preserving Technologies in Data · Privacy, Security, and Data Protection
