CodeGrad: Integrating Multi-Step Verification with Gradient-Based LLM Refinement
Yueke Zhang, Yifan Zhang, Kevin Leach, Yu Huang

TL;DR
CodeGrad presents a novel framework that integrates verification with gradient-based refinement in LLMs to produce more correct, robust, and mathematically justified code solutions, significantly improving performance on benchmark datasets.
Contribution
It introduces a new method that treats code as a differentiable variable, enabling iterative refinement guided by structured feedback and constraints.
Findings
Achieves up to 27% improvement on HumanEval
Attains 41% relative improvement on LiveCodeBench V6
Generates mathematically justified and robust code
Abstract
While Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, they often produce solutions that lack guarantees of correctness, robustness, and efficiency. This limitation is particularly acute in domains requiring strict constraints. CodeGrad introduces a principled framework that integrates rigorous verification techniques directly into an iterative LLM-based generation loop. It uniquely treats code as a differentiable variable, converting structured feedback and mathematical constraints into a textual pseudo-gradient. This gradient guides the model to iteratively refine solutions, ensuring they are not only functional but also robust and mathematically justified. We evaluate CodeGrad on the HumanEval, HumanEval+, and LiveCodeBench benchmarks. Our implementation outperforms strong baselines, achieving an absolute improvement of up to 27% on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education
