TSNBench: Benchmarking LLM Proficiency in Time-Sensitive Networking
Rubi Debnath, Daniel Bujosa Mateu, Luxi Zhao, Silviu S. Craciunas, Paul Pop, Sebastian Steinhorst

TL;DR
TSNBench is a new benchmark evaluating large language models' proficiency in safety-critical Time-Sensitive Networking, revealing significant gaps in open-ended delay computation tasks despite high MCQ accuracy.
Contribution
The paper introduces TSNBench, the first comprehensive benchmark for LLMs in TSN, including expert-validated MCQs and open-ended WCD tasks with verified ground truths.
Findings
LLMs achieve 67-95% accuracy on MCQs but perform poorly on WCD tasks.
GPT-5 has a 36.2% MAPE on CBS WCD prediction, the best among tested models.
Most models' WCD prediction errors exceed 80%, risking real-time safety violations.
Abstract
We present TSNBench, the first benchmark for evaluating large language model (LLM) proficiency in Time-Sensitive Networking (TSN), a suite of IEEE 802.1 standards for deterministic communication with bounded latency in safety-critical domains such as autonomous vehicles, aviation, defense, and industrial automation. While LLMs have been extensively evaluated on general knowledge tasks, their capabilities in safety-critical networking domains remain largely unexplored. TSNBench comprises 939 expert-validated multiple-choice questions (MCQs) covering diverse TSN mechanisms, along with 100 open-ended Worst-Case Delay (WCD) computation tasks for Credit-Based Shaper (CBS) and Cyclic Queuing and Forwarding (CQF) across varying network topologies and traffic conditions. MCQ answers are validated by domain experts, and open-ended ground truth WCD values are computed using a verified Network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
