SecQA: A Concise Question-Answering Dataset for Evaluating Large   Language Models in Computer Security

Zefang Liu

arXiv:2312.15838·cs.CL·December 27, 2023·6 cites

SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security

Zefang Liu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

SecQA is a new dataset designed to evaluate large language models' understanding of computer security through multiple-choice questions, revealing their strengths and limitations across different models and difficulty levels.

Contribution

We created SecQA, a novel security-focused question-answering dataset, and conducted comprehensive evaluations of prominent LLMs, establishing a benchmark for future research in this domain.

Findings

01

LLMs show varying performance levels on SecQA questions.

02

GPT-4 outperforms other models in security understanding.

03

Model capabilities decrease with increased question complexity.

Abstract

In this paper, we introduce SecQA, a novel dataset tailored for evaluating the performance of Large Language Models (LLMs) in the domain of computer security. Utilizing multiple-choice questions generated by GPT-4 based on the "Computer Systems Security: Planning for Success" textbook, SecQA aims to assess LLMs' understanding and application of security principles. We detail the structure and intent of SecQA, which includes two versions of increasing complexity, to provide a concise evaluation across various difficulty levels. Additionally, we present an extensive evaluation of prominent LLMs, including GPT-3.5-Turbo, GPT-4, Llama-2, Vicuna, Mistral, and Zephyr models, using both 0-shot and 5-shot learning settings. Our results, encapsulated in the SecQA v1 and v2 datasets, highlight the varying capabilities and limitations of these models in the computer security context. This study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zefang-liu/lm-evaluation-harness
jaxOfficial

Datasets

zefang-liu/secqa
dataset· 660 dl
660 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Attention Dropout · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Dense Connections · Linear Warmup With Cosine Annealing