DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation
Abhijeet Pathak, Suvadra Barua, Dinesh Gudimetla, Rupam Patir, Jiawei Guo, Hongxin Hu, Haipeng Cai

TL;DR
DUALGAUGE is an automated framework that jointly evaluates the security and correctness of code generated by large language models, addressing a critical gap in secure software development.
Contribution
It introduces DUALGAUGE, the first fully automated benchmark for simultaneous security and correctness evaluation of LLM-generated code, along with a curated dataset DUALGAUGE-BENCH.
Findings
Identified significant gaps in security and correctness in current LLMs
Provided a scalable, reproducible evaluation system for secure code generation
Published open-source tools and datasets to advance research in secure AI coding
Abstract
Large language models (LLMs) and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement remains unmet: ensuring that generated code is secure without compromising its functional correctness. Existing benchmarks and evaluations for secure code generation fall short-many measure only vulnerability reduction, disregard correctness preservation, or evaluate security and functionality on separate datasets, violating the fundamental need for simultaneous joint evaluation. We present DUALGAUGE, the first fully automated benchmarking framework designed to rigorously evaluate the security and correctness of LLM-generated code in unison. Given the lack of datasets enabling joint evaluation of secure code generation, we also present DUALGAUGE-BENCH, a curated benchmark suite of diverse coding tasks, each paired with manually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research
