DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation

Abhijeet Pathak; Suvadra Barua; Dinesh Gudimetla; Rupam Patir; Jiawei Guo; Hongxin Hu; Haipeng Cai

arXiv:2511.20709·cs.SE·November 27, 2025

DUALGUAGE: Automated Joint Security-Functionality Benchmarking for Secure Code Generation

Abhijeet Pathak, Suvadra Barua, Dinesh Gudimetla, Rupam Patir, Jiawei Guo, Hongxin Hu, Haipeng Cai

PDF

Open Access

TL;DR

DUALGAUGE is an automated framework that jointly evaluates the security and correctness of code generated by large language models, addressing a critical gap in secure software development.

Contribution

It introduces DUALGAUGE, the first fully automated benchmark for simultaneous security and correctness evaluation of LLM-generated code, along with a curated dataset DUALGAUGE-BENCH.

Findings

01

Identified significant gaps in security and correctness in current LLMs

02

Provided a scalable, reproducible evaluation system for secure code generation

03

Published open-source tools and datasets to advance research in secure AI coding

Abstract

Large language models (LLMs) and autonomous coding agents are increasingly used to generate software across a wide range of domains. Yet a core requirement remains unmet: ensuring that generated code is secure without compromising its functional correctness. Existing benchmarks and evaluations for secure code generation fall short-many measure only vulnerability reduction, disregard correctness preservation, or evaluate security and functionality on separate datasets, violating the fundamental need for simultaneous joint evaluation. We present DUALGAUGE, the first fully automated benchmarking framework designed to rigorously evaluate the security and correctness of LLM-generated code in unison. Given the lack of datasets enabling joint evaluation of secure code generation, we also present DUALGAUGE-BENCH, a curated benchmark suite of diverse coding tasks, each paired with manually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research