SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation

Qian Dong; Jia Chen; Qingyao Ai; Hongning Wang; Haitao Li; Yi Wu; Yao Hu; Yiqun Liu; Shaoping Ma

arXiv:2507.19033·cs.IR·October 10, 2025

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation

Qian Dong, Jia Chen, Qingyao Ai, Hongning Wang, Haitao Li, Yi Wu, Yao Hu, Yiqun Liu, Shaoping Ma

PDF

Open Access 1 Video

TL;DR

SelfRACG introduces a novel approach where large language models self-express their information needs to improve retrieval-augmented code generation, leading to better code quality and relevance.

Contribution

It proposes a new paradigm enabling LLMs to self-express their information needs, enhancing retrieval accuracy and code generation performance in RACG systems.

Findings

01

SelfRACG outperforms traditional RACG methods in code generation tasks.

02

The approach effectively aligns retrieved knowledge with LLMs' specific information needs.

03

Experimental results show improved code quality and relevance.

Abstract

Existing retrieval-augmented code generation (RACG) methods typically use an external retrieval module to fetch semantically similar code snippets used for generating subsequent fragments. However, even for consecutive code fragments, the content often diverges due to logical progression, resulting in a content gap. This gap undermines the performance of current RACG methods, as \textit{external} retrieval modules based on content matching fail to infer the specific information need of LLMs to generate the next code fragment. Therefore, we propose \textbf{SelfRACG}, a novel paradigm that enables large language models (LLMs) to \textbf{Self}-express their information needs to enhance \textbf{RACG}. Specifically, SelfRACG includes an information need expression module and a two-stage information need-guided training strategy, which encourages LLMs to express their information need.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation· underline

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques