Knowledge-to-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation
Konstantinos E. Kampourakis, Vyron Kampourakis, Efstratios Chatzoglou, Georgios Kambourakis, Stefanos Gritzalis

TL;DR
This paper demonstrates that Large Language Models can generate realistic, labeled network traffic datasets for IDS evaluation by combining protocol knowledge and statistical rules, offering a privacy-preserving alternative to traditional data collection methods.
Contribution
It introduces a methodology for using LLMs as knowledge-to-data engines to produce synthetic network traffic without fine-tuning or raw data access, validated on a complex IEEE 802.11 benchmark.
Findings
LLM-generated datasets closely match real traffic statistics
Gradient-boosting classifiers achieve F1-scores up to 0.956 on real data
Constrained LLMs enable testbed-free, privacy-preserving IDS evaluation
Abstract
Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems (IDS). However, they remain difficult to obtain due to privacy constraints, data sensitivity, and the cost of building controlled collection environments such as testbeds and cyber ranges. This paper investigates whether Large Language Models (LLMs) can operate as controlled knowledge-to-data engines for generating structured synthetic network traffic datasets suitable for IDS research. We propose a methodology that combines protocol documentation, attack semantics, and explicit statistical rules to condition LLMs without fine-tuning or access to raw samples. Using the AWID3 IEEE~802.11 benchmark as a demanding case study, we generate labeled datasets with four state-of-the-art LLMs and assess fidelity through a multi-level validation framework including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Network Packet Processing and Optimization
