# Towards Better Correctness and Efficiency in Code Generation

**Authors:** Yunlong Feng, Yang Xu, Xiao Xu, Binyuan Hui, Junyang Lin

arXiv: 2508.20124 · 2025-08-29

## TL;DR

This paper introduces a reinforcement learning framework that enhances code generation models by improving runtime efficiency without sacrificing correctness, using dynamic exploration and novel reward signals.

## Contribution

It proposes a two-stage tuning method that balances correctness and efficiency, significantly improving performance of 7B models in code generation.

## Key findings

- Code correctness improved by 10.18%
- Runtime efficiency increased by 7.75%
- Achieves performance comparable to larger models

## Abstract

While code large language models have demonstrated remarkable progress in code generation, the generated code often exhibits poor runtime efficiency, limiting its practical application in performance-sensitive scenarios. To address this limitation, we propose an efficiency-oriented reinforcement learning framework guided by a novel performance reward. Based on this framework, we take a deeper dive into the code efficiency problem, identifying then proposing methods to overcome key bottlenecks: (1) Dynamic exploration overcomes the static data constraints of offline fine-tuning, enabling the discovery of more efficient code implementations. (2) The error-insensitive reinforcement learning method and high-contrast efficiency signals are crucial for mitigating systematic errors and achieving effective optimization. (3) Online exploration is most effective when starting from a high-correctness baseline, as this allows for efficiency improvements without sacrificing accuracy. With these discoveries, we finally propose a two-stage tuning method, which achieves high and balanced performance across correctness and efficiency. The results of experiments show the effectiveness of the method, which improves code correctness by 10.18\% and runtime efficiency by 7.75\% on a 7B model, achieving performance comparable to much larger model.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20124/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20124/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/2508.20124/full.md

---
Source: https://tomesphere.com/paper/2508.20124