LLMSecCode: Evaluating Large Language Models for Secure Coding
Anton Ryd\'en, Erik N\"aslund, Elad Michael Schiller, Magnus Almgren

TL;DR
This paper introduces LLMSecCode, an open-source framework for objectively evaluating the cybersecurity capabilities of large language models in secure coding, aiming to standardize benchmarking and improve selection processes.
Contribution
The paper presents a novel open-source evaluation framework for assessing LLMs in secure coding, addressing key research questions and validating its effectiveness through experiments.
Findings
Performance varies by 10% with different prompts and parameters.
Comparison with external actors shows a 5% performance difference.
Framework facilitates standardized benchmarking of LLMs in security tasks.
Abstract
The rapid deployment of Large Language Models (LLMs) requires careful consideration of their effect on cybersecurity. Our work aims to improve the selection process of LLMs that are suitable for facilitating Secure Coding (SC). This raises challenging research questions, such as (RQ1) Which functionality can streamline the LLM evaluation? (RQ2) What should the evaluation measure? (RQ3) How to attest that the evaluation process is impartial? To address these questions, we introduce LLMSecCode, an open-source evaluation framework designed to assess LLM SC capabilities objectively. We validate the LLMSecCode implementation through experiments. When varying parameters and prompts, we find a 10% and 9% difference in performance, respectively. We also compare some results to reliable external actors, where our results show a 5% difference. We strive to ensure the ease of use of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer Genomics and Diagnostics
