FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system
Zeyuan Li, Yangfan He, Lewei He, Jianhui Wang, Tianyu Shi, Bin Lei, Yuchen Li, Qiuwu Chen

TL;DR
FALCON is a hierarchical, feedback-driven coding optimization system that enhances large language models' ability to generate accurate, user-aligned code by leveraging long-term and short-term memory mechanisms and meta-reinforcement learning.
Contribution
The paper introduces FALCON, a novel hierarchical feedback-driven framework with meta-reinforcement learning for improved code generation in LLMs, addressing diversity and edge cases.
Findings
Achieves over 4.5% improvement on MBPP benchmark.
Achieves over 6.1% improvement on Humaneval benchmark.
Outperforms existing reinforcement learning methods.
Abstract
Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in coding scenarios. In particular, they were hampered by datasets that lacked diversity and failed to address specialized tasks or edge cases. Furthermore, challenges in supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) led to failures in generating precise, human-intent-aligned code. To tackle these challenges and improve the code generation performance for automated programming systems, we propose Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization (i.e., FALCON). FALCON is structured into two hierarchical levels. From the global level, long-term memory improves code quality by retaining and applying learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsALIGN
