SALLM: Security Assessment of Generated Code
Mohammed Latif Siddiq, Joanna C. S. Santos, Sajith Devareddy, Anna, Muller

TL;DR
This paper introduces SALLM, a comprehensive framework for benchmarking large language models' ability to generate secure code, addressing gaps in existing datasets and evaluation metrics that overlook security aspects.
Contribution
SALLM provides a new dataset, assessment techniques, and metrics specifically designed to evaluate the security quality of code generated by LLMs.
Findings
SALLM enables systematic benchmarking of secure code generation.
The framework highlights security weaknesses in current LLM-generated code.
It offers tools to improve security-aware code generation by LLMs.
Abstract
With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate LLMs do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research
MethodsFocus
