CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments

Gustav Keppler; Moritz Gst\"ur; Veit Hagenmeyer

arXiv:2604.06019·cs.CR·April 8, 2026

CritBench: A Framework for Evaluating Cybersecurity Capabilities of Large Language Models in IEC 61850 Digital Substation Environments

Gustav Keppler, Moritz Gst\"ur, Veit Hagenmeyer

PDF

1 Repo

TL;DR

CritBench is a new framework for evaluating the cybersecurity skills of large language models specifically within IEC 61850 digital substation environments, addressing a gap in existing IT-focused assessments.

Contribution

It introduces CritBench, a domain-specific evaluation framework, and assesses multiple models on 81 cybersecurity tasks in operational technology settings.

Findings

01

Models reliably perform static configuration analysis and network enumeration.

02

Performance drops on dynamic tasks requiring live system interaction.

03

Tool scaffolds improve operational capabilities of models.

Abstract

The advancement of Large Language Models (LLMs) has raised concerns regarding their dual-use potential in cybersecurity. Existing evaluation frameworks overwhelmingly focus on Information Technology (IT) environments, failing to capture the constraints, and specialized protocols of Operational Technology (OT). To address this gap, we introduce CritBench, a novel framework designed to evaluate the cybersecurity capabilities of LLM agents within IEC 61850 Digital Substation environments. We assess five state-of-the-art models, including OpenAI's GPT-5 suite and open-weight models, across a corpus of 81 domain-specific tasks spanning static configuration analysis, network traffic reconnaissance, and live virtual machine interaction. To facilitate industrial protocol interaction, we develop a domain-specific tool scaffold. Our empirical results show that agents reliably execute static…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GKeppler/CritBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.