Knowledge-to-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation

Konstantinos E. Kampourakis; Vyron Kampourakis; Efstratios Chatzoglou; Georgios Kambourakis; Stefanos Gritzalis

arXiv:2601.05022·cs.CR·January 9, 2026

Knowledge-to-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation

Konstantinos E. Kampourakis, Vyron Kampourakis, Efstratios Chatzoglou, Georgios Kambourakis, Stefanos Gritzalis

PDF

Open Access

TL;DR

This paper demonstrates that Large Language Models can generate realistic, labeled network traffic datasets for IDS evaluation by combining protocol knowledge and statistical rules, offering a privacy-preserving alternative to traditional data collection methods.

Contribution

It introduces a methodology for using LLMs as knowledge-to-data engines to produce synthetic network traffic without fine-tuning or raw data access, validated on a complex IEEE 802.11 benchmark.

Findings

01

LLM-generated datasets closely match real traffic statistics

02

Gradient-boosting classifiers achieve F1-scores up to 0.956 on real data

03

Constrained LLMs enable testbed-free, privacy-preserving IDS evaluation

Abstract

Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems (IDS). However, they remain difficult to obtain due to privacy constraints, data sensitivity, and the cost of building controlled collection environments such as testbeds and cyber ranges. This paper investigates whether Large Language Models (LLMs) can operate as controlled knowledge-to-data engines for generating structured synthetic network traffic datasets suitable for IDS research. We propose a methodology that combines protocol documentation, attack semantics, and explicit statistical rules to condition LLMs without fine-tuning or access to raw samples. Using the AWID3 IEEE~802.11 benchmark as a demanding case study, we generate labeled datasets with four state-of-the-art LLMs and assess fidelity through a multi-level validation framework including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Internet Traffic Analysis and Secure E-voting · Network Packet Processing and Optimization