TOSSS: a CVE-based Software Security Benchmark for Large Language Models

Marc Damie; Murat Bilgehan Ertan; Domenico Essoussi; Angela Makhanu; Ga\"etan Peter; Roos Wensveen

arXiv:2603.10969·cs.LG·March 17, 2026

TOSSS: a CVE-based Software Security Benchmark for Large Language Models

Marc Damie, Murat Bilgehan Ertan, Domenico Essoussi, Angela Makhanu, Ga\"etan Peter, Roos Wensveen

PDF

Open Access

TL;DR

TOSSS is a new benchmark that assesses large language models' ability to distinguish secure code snippets from vulnerable ones using the CVE database, providing a security score to evaluate their safety in software development.

Contribution

The paper introduces TOSSS, an extensible CVE-based benchmark for evaluating LLMs' security awareness in code selection, addressing limitations of existing security benchmarks.

Findings

01

Scores ranged from 0.48 to 0.89 across models.

02

TOSSS can complement existing LLM benchmarks with security assessments.

03

The framework is adaptable to new vulnerabilities over time.

Abstract

With their increasing capabilities, Large Language Models (LLMs) are now used across many industries. They have become useful tools for software engineers and support a wide range of development tasks. As LLMs are increasingly used in software development workflows, a critical question arises: are LLMs good at software security? At the same time, organizations worldwide invest heavily in cybersecurity to reduce exposure to disruptive attacks. The integration of LLMs into software engineering workflows may introduce new vulnerabilities and weaken existing security efforts. We introduce TOSSS (Two-Option Secure Snippet Selection), a benchmark that measures the ability of LLMs to choose between secure and vulnerable code snippets. Existing security benchmarks for LLMs cover only a limited range of vulnerabilities. In contrast, TOSSS relies on the CVE database and provides an extensible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Software Engineering Research · Web Application Security Vulnerabilities