Enhancing LLM Agents for Code Generation with Possibility and Pass-rate Prioritized Experience Replay
Yuyang Chen, Kaiyan Zhao, Yiming Wang, Ming Yang, Jian Zhang,, Xiaoguang Niu

TL;DR
This paper introduces a novel experience replay method with a prioritized value based on possibility and pass-rate to improve the efficiency and performance of transformer-based LLMs in code generation tasks.
Contribution
It proposes the BTP pipeline that integrates prioritized experience replay using P2Value, enhancing code generation efficiency and accuracy in LLMs.
Findings
Improved code generation performance across multiple LLMs.
Higher pass rates and efficiency in generating correct code.
Outperforms existing baseline methods.
Abstract
Nowadays transformer-based Large Language Models (LLM) for code generation tasks usually apply sampling and filtering pipelines. Due to the sparse reward problem in code generation tasks caused by one-token incorrectness, transformer-based models will sample redundant programs till they find a correct one, leading to low efficiency. To overcome the challenge, we incorporate Experience Replay (ER) in the fine-tuning phase, where codes and programs produced are stored and will be replayed to give the LLM agent a chance to learn from past experiences. Based on the spirit of ER, we introduce a novel approach called BTP pipeline which consists of three phases: beam search sampling, testing phase, and prioritized experience replay phase. The approach makes use of failed programs collected by code models and replays programs with high Possibility and Pass-rate Prioritized value (P2Value) from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Advanced Software Engineering Methodologies · Embedded Systems Design Techniques
MethodsExperience Replay · Prioritized Experience Replay
