From Flows to Words: Can Zero-/Few-Shot LLMs Detect Network Intrusions? A Grammar-Constrained, Calibrated Evaluation on UNSW-NB15
Mohammad Abdul Rehman, Syed Imad Ali Shah, Abbas Anwar, Noor Islam

TL;DR
This paper evaluates zero- and few-shot large language models for network intrusion detection by converting network flows into text and using structured prompts, showing promising results with interpretability and minimal training.
Contribution
It introduces a novel flow-to-text protocol with interpretable cues, a calibration method for decision thresholds, and a comprehensive baseline comparison for LLM-based intrusion detection.
Findings
Instruction-guided prompts with flags improve detection accuracy.
Calibrated scoring stabilizes results across different models.
Prompt-only approach offers interpretability and easy adaptation.
Abstract
Large Language Models (LLMs) can reason over natural-language inputs, but their role in intrusion detection without fine-tuning remains uncertain. This study evaluates a prompt-only approach on UNSW-NB15 by converting each network flow to a compact textual record and augmenting it with lightweight, domain-inspired boolean flags (asymmetry, burst rate, TTL irregularities, timer anomalies, rare service/state, short bursts). To reduce output drift and support measurement, the model is constrained to produce structured, grammar-valid responses, and a single decision threshold is calibrated on a small development split. We compare zero-shot, instruction-guided, and few-shot prompting to strong tabular and neural baselines under identical splits, reporting accuracy, precision, recall, F1, and macro scores. Empirically, unguided prompting is unreliable, while instructions plus flags…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
