SECURE: Benchmarking Large Language Models for Cybersecurity
Dipkamal Bhusal, Md Tanvirul Alam, Le Nguyen, Ashim Mahara, Zachary, Lightcap, Rodney Frazier, Romy Fieblinger, Grace Long Torales, Benjamin A., Blakely, Nidhi Rastogi

TL;DR
SECURE is a new benchmark designed to evaluate large language models specifically in cybersecurity tasks, focusing on industrial control systems to assess their knowledge, understanding, and reasoning capabilities.
Contribution
The paper introduces SECURE, a cybersecurity-specific benchmark with datasets for industry-relevant tasks, and evaluates state-of-the-art LLMs to identify strengths and weaknesses.
Findings
Models show varied performance across tasks.
Benchmark reveals gaps in LLM cybersecurity understanding.
Recommendations for enhancing LLM reliability in cybersecurity.
Abstract
Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding \& Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Information and Cyber Security · Data Quality and Management
