SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for   LLMs in Cybersecurity

Pengfei Jing; Mengyun Tang; Xiaorong Shi; Xing Zheng; Sen Nie; Shi Wu,; Yong Yang; Xiapu Luo

arXiv:2412.20787·cs.CR·January 7, 2025

SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity

Pengfei Jing, Mengyun Tang, Xiaorong Shi, Xing Zheng, Sen Nie, Shi Wu,, Yong Yang, Xiapu Luo

PDF

Open Access 2 Datasets

TL;DR

SecBench is a large, multi-dimensional dataset designed to evaluate LLMs specifically in cybersecurity, covering various question types, languages, and difficulty levels to better assess domain-specific capabilities.

Contribution

This paper introduces SecBench, the largest and most comprehensive cybersecurity benchmark dataset for LLMs, with diverse question formats, languages, and expert-level content.

Findings

01

Benchmarking 16 SOTA LLMs demonstrates SecBench's usability.

02

SecBench includes over 47,000 cybersecurity questions in multiple formats and languages.

03

Automatic evaluation methods enable scalable assessment of LLM performance.

Abstract

Evaluating Large Language Models (LLMs) is crucial for understanding their capabilities and limitations across various applications, including natural language processing and code generation. Existing benchmarks like MMLU, C-Eval, and HumanEval assess general LLM performance but lack focus on specific expert domains such as cybersecurity. Previous attempts to create cybersecurity datasets have faced limitations, including insufficient data volume and a reliance on multiple-choice questions (MCQs). To address these gaps, we propose SecBench, a multi-dimensional benchmarking dataset designed to evaluate LLMs in the cybersecurity domain. SecBench includes questions in various formats (MCQs and short-answer questions (SAQs)), at different capability levels (Knowledge Retention and Logical Reasoning), in multiple languages (Chinese and English), and across various sub-domains. The dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Cloud Data Security Solutions · Privacy-Preserving Technologies in Data

MethodsFocus