Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent   Cybersecurity

Tam n. Nguyen

arXiv:2406.06863·cs.CR·June 12, 2024·1 cites

Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity

Tam n. Nguyen

PDF

Open Access 1 Repo 1 Datasets

TL;DR

OllaBench is a new evaluation framework for assessing LLMs' reasoning in human-centric interdependent cybersecurity, focusing on accuracy, wastefulness, and consistency, with insights from cognitive theories and empirical data.

Contribution

The paper introduces OllaBench, a comprehensive, theory-based evaluation framework for LLMs in cybersecurity, addressing human factors often overlooked in prior assessments.

Findings

01

Commercial LLMs achieve higher accuracy but still have room for improvement.

02

Open-weight LLMs perform competitively in some aspects, especially in efficiency.

03

Significant differences exist among models in token efficiency and consistency.

Abstract

Large Language Models (LLMs) have the potential to enhance Agent-Based Modeling by better representing complex interdependent cybersecurity systems, improving cybersecurity threat modeling and risk management. However, evaluating LLMs in this context is crucial for legal compliance and effective application development. Existing LLM evaluation frameworks often overlook the human factor and cognitive computing capabilities essential for interdependent cybersecurity. To address this gap, I propose OllaBench, a novel evaluation framework that assesses LLMs' accuracy, wastefulness, and consistency in answering scenario-based information security compliance and non-compliance questions. OllaBench is built on a foundation of 24 cognitive behavioral theories and empirical evidence from 38 peer-reviewed papers. OllaBench was used to evaluate 21 LLMs, including both open-weight and commercial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cybonto/ollabench
noneOfficial

Datasets

theResearchNinja/violentutf_cybersecurityBehavior
dataset· 33 dl
33 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Cybersecurity and Cyber Warfare Studies