TOSSS: a CVE-based Software Security Benchmark for Large Language Models
Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi, Angela Makhanu, Ga\"etan Peter, Roos Wensveen

TL;DR
TOSSS is a new benchmark that assesses large language models' ability to distinguish secure code snippets from vulnerable ones using the CVE database, providing a security score to evaluate their safety in software development.
Contribution
The paper introduces TOSSS, an extensible CVE-based benchmark for evaluating LLMs' security awareness in code selection, addressing limitations of existing security benchmarks.
Findings
Scores ranged from 0.48 to 0.89 across models.
TOSSS can complement existing LLM benchmarks with security assessments.
The framework is adaptable to new vulnerabilities over time.
Abstract
With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are LLMs good at software security? At the same time, organizations worldwide invest heavily in cybersecurity to reduce exposure to disruptive attacks. The integration of LLMs into software engineering workflows may introduce new vulnerabilities and weaken existing security efforts. We introduce TOSSS (Two-Option Secure Snippet Selection), a benchmark that measures the ability of LLMs to choose between secure and vulnerable code snippets. Existing security benchmarks for LLMs cover only a limited range of vulnerabilities. In contrast, TOSSS relies on the CVE database and provides an extensible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation and Cyber Security · Software Engineering Research · Web Application Security Vulnerabilities
