Structured Safety Auditing for Balancing Code Correctness and Content Safety in LLM-Generated Code
Honghao Tan, Haibo Wang, Shin Hwei Tan

TL;DR
This paper introduces a structured safety auditing method and a new metric, SUDS, to balance code correctness and safety in LLM-generated code, demonstrating improved safety performance across models.
Contribution
It proposes the Dual Reasoning technique and the SUDS metric to unify safety and utility assessment, advancing responsible code generation in LLMs.
Findings
DR achieves highest SUDS scores across models.
DR's effectiveness increases with model capacity.
Structured reasoning complements safety vocabularies limitations.
Abstract
Large language models (LLMs) for code generation are typically evaluated on functional correctness alone, overlooking whether generated code propagates harmful content embedded in the prompt. Prior work has shown that most Code LLMs reproduce offensive identifiers from injected renaming instructions without warning, yet existing approaches focus on detecting harmful content, neglecting functional correctness. Grounded in the Theory of Dual Channel Constraints (which states that code is a dual-channel medium combining an algorithmic (AL) channel for machine execution and a natural language (NL) channel for human communication, creating a unique safety-utility trade-off where a model must balance functional execution with responsible communication), we propose NLSafety-Utility Duality Score (SUDS), a metric that unifies code utility, safety adherence, and warning awareness into a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
