Exploring QUIC Dynamics: A Large-Scale Dataset for Encrypted Traffic Analysis
Barak Gahtan, Robert J. Shahla, Alex M. Bronstein, Reuven Cohen

TL;DR
This paper introduces VisQUIC, a comprehensive large-scale dataset of encrypted QUIC traffic with decryption keys and diverse implementations, enabling advanced machine learning analysis and benchmarking for network security and performance research.
Contribution
We present VisQUIC, a novel dataset with SSL keys, multiple QUIC implementations, and a new image-based representation for encrypted traffic analysis, supporting reproducible benchmarking.
Findings
Achieved 97% accuracy in estimating HTTP/3 responses from encrypted traffic.
Provided a large-scale, labeled dataset with decryption keys and diverse implementations.
Enabled machine learning analysis of encrypted QUIC traffic.
Abstract
The increasing adoption of the QUIC transport protocol has transformed encrypted web traffic, necessitating new methodologies for network analysis. However, existing datasets lack the scope, metadata, and decryption capabilities required for robust benchmarking in encrypted traffic research. We introduce VisQUIC, a large-scale dataset of 100,000 labeled QUIC traces from over 44,000 websites, collected over four months. Unlike prior datasets, VisQUIC provides SSL keys for controlled decryption, supports multiple QUIC implementations (Chromium QUIC, Facebooks mvfst, Cloudflares quiche), and introduces a novel image-based representation that enables machine learning-driven encrypted traffic analysis. The dataset includes standardized benchmarking tools, ensuring reproducibility. To demonstrate VisQUICs utility, we present a benchmarking task for estimating HTTP/3 responses in encrypted…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
* It Introduces a large-scale dataset comprising over 100,000 QUIC traces from more than 44,000 websites, collected over four months. Obviously, such work takes great effort, and it provides a valuable resource for the community to analyze QUIC traffic. * It generates over seven million labeled images from the QUIC traces, with configurable parameters such as window length, pixel resolution, normalization, and labels. These images enable an observer to analyze and understand QUIC encrypted conn
* The presentation of this paper is feeble; in many cases, it looks more like a technical report than an academic paper. The followings illustrate some evidence: - A major issue is that the authors barely mention the motivation before conducting something. For a dataset paper, typically the first thing the authors should write about is "Why do we need a new dataset" (after the paper reviews the existing datasets): Does the network environment change a lot over the years? Are any emerging proto
The paper provides a network dataset converted into images. The authors claim that the dataset can help understand the quiz protocol dynamics when the traffic is encrypted.
• In the abstract, line 13, the authors state: _"These features, however, also present challenges for network operators who need to monitor and analyze web traffic."_ Yet, the paper does not clarify how the proposed method addresses this issue. Similarly, lines 33-35 mention, _"Traditional traffic analysis methods are less effective with QUIC due to its encryption, necessitating innovative approaches to manage network performance and its effects on latency, error rates, and congestion control."_
the paper open source a dataset, which may have independent value -- it is unclear though if the dataset scale and the choices made make this a relevant effort for the ICLR community ?
the paper has several limitations - it is unclear if this type of paper is a good fit for the ICLR community: after all, ICLR stands for international conference on learning representations, whereas here the representation is given ; lexicographical jokes aside, it is unclear if this is an important dataset for the ICLR community, and to this reviewer opinion the answer leans on the negative side (the labels is automatically extracted as the number of objects carried in the stream so the da
- analyzing QUIC dynamics is an important problem - 100k+ QUIC traces have been collected
- critical information is missing regarding the construction of the dataset (see questions below) - the platform for collecting the dataset (=code) is not provided by the authors - the dataset (which is the core of the paper) is not public but upon request - the conversion of traces to images is not new - regarding the regression problem, it is a classic regression problem, not an ordinal regression problem. The predicted variable is the number of observed responses, which is discrete but not re
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Network Security and Intrusion Detection
