CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

Gustav Keppler; Ghada Elbez; Veit Hagenmeyer

arXiv:2604.20389·cs.CR·April 23, 2026

CyberCertBench: Evaluating LLMs in Cybersecurity Certification Knowledge

Gustav Keppler, Ghada Elbez, Veit Hagenmeyer

PDF

1 Repo

TL;DR

CyberCertBench is a new benchmark suite for evaluating LLMs' cybersecurity certification knowledge, introducing a Proposer-Verifier framework for interpretability, and analyzing model performance across standards and scales.

Contribution

It presents CyberCertBench, a comprehensive MCQA benchmark for cybersecurity standards, and proposes a novel interpretability framework for LLM evaluation.

Findings

01

Frontier models reach human expert level in general cybersecurity knowledge.

02

Model accuracy drops on vendor-specific and formal standards questions.

03

Scaling trends show diminishing returns for larger models.

Abstract

The rapid evolution and use of Large Language Models (LLMs) in professional workflows require an evaluation of their domain-specific knowledge against industry standards. We introduceCyberCertBench, a new suite of Multiple Choice Question Answering (MCQA) benchmarks derived from industry recognized certifications. CyberCertBench evaluates LLM domain knowledgeagainst the professional standards of Information Technology cybersecurity and more specializedareas such as Operational Technology and related cybersecurity standards. Concurrently, we propose and validate a novel Proposer-Verifier framework, a methodology to generate interpretable,natural language explanations for model performance. Our evaluation shows that frontier modelsachieve human expert level in general networking and IT security knowledge. However, theiraccuracy declines in questions that require vendor-specific nuances or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GKeppler/CyberCertBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.