# Enhancing queries for code generation with reinforcement learning

**Authors:** Dawei Yuan, Guojun Liang, Tingting Li, Suping Liu

PMC · DOI: 10.1038/s41598-025-21271-4 · 2025-10-24

## TL;DR

This paper introduces a reinforcement learning method to improve code generation by refining natural language queries, achieving significant performance gains.

## Contribution

A novel reinforcement learning framework using LoRA to enhance code generation queries with combined text and execution rewards.

## Key findings

- RL4QE improves code similarity by 34.3% on the DS1000 benchmark.
- BLEU-4 is the most reliable text reward, and LoRA with rank 8 outperforms full fine-tuning.

## Abstract

We present a reinforcement learning framework that enhances natural language queries to improve DeepSeek code generation. A parametric refiner (Qwen with LoRA) is trained via REINFORCE while the generator remains fixed, using a scalar reward that can combine text similarity (BLEU-4, ROUGE-L, F1, Overlap) with execution signals (unit tests, syntax/timeout penalties). On the DS1000 benchmark (800 train / 200 test), RL4QE improves the code similarity by 34.3%. Ablations show that BLEU-4 is the most reliable text reward overall (with F1 competitive on a larger scale), and LoRA with rank \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$r{=}8$$\end{document} outperforms complete fine-tuning on most metrics while being more parameter efficient. The approach is transferred across foundation models (e.g., Qwen1.5/2/2.5 variants), where architecture often matters more than size. RL4QE is easy to integrate in practice (LoRA in attention projections) and supports reproducibility.

## Full-text entities

- **Chemicals:** BLEU (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12552436/full.md

---
Source: https://tomesphere.com/paper/PMC12552436