SECURE: Benchmarking Large Language Models for Cybersecurity

Dipkamal Bhusal; Md Tanvirul Alam; Le Nguyen; Ashim Mahara; Zachary; Lightcap; Rodney Frazier; Romy Fieblinger; Grace Long Torales; Benjamin A.; Blakely; Nidhi Rastogi

arXiv:2405.20441·cs.CR·October 31, 2024·2 cites

SECURE: Benchmarking Large Language Models for Cybersecurity

Dipkamal Bhusal, Md Tanvirul Alam, Le Nguyen, Ashim Mahara, Zachary, Lightcap, Rodney Frazier, Romy Fieblinger, Grace Long Torales, Benjamin A., Blakely, Nidhi Rastogi

PDF

Open Access 1 Repo 1 Datasets

TL;DR

SECURE is a new benchmark designed to evaluate large language models specifically in cybersecurity tasks, focusing on industrial control systems to assess their knowledge, understanding, and reasoning capabilities.

Contribution

The paper introduces SECURE, a cybersecurity-specific benchmark with datasets for industry-relevant tasks, and evaluates state-of-the-art LLMs to identify strengths and weaknesses.

Findings

01

Models show varied performance across tasks.

02

Benchmark reveals gaps in LLM cybersecurity understanding.

03

Recommendations for enhancing LLM reliability in cybersecurity.

Abstract

Large Language Models (LLMs) have demonstrated potential in cybersecurity applications but have also caused lower confidence due to problems like hallucinations and a lack of truthfulness. Existing benchmarks provide general evaluations but do not sufficiently address the practical and applied aspects of LLM performance in cybersecurity-specific tasks. To address this gap, we introduce the SECURE (Security Extraction, Understanding \& Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources. Our study evaluates seven state-of-the-art models on these tasks, providing insights into their strengths and weaknesses in cybersecurity contexts, and offer recommendations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiforsec/secure
noneOfficial

Datasets

RISys-Lab/Benchmarks_CyberSec_SECURE
dataset· 45 dl
45 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Information and Cyber Security · Data Quality and Management