FunPRM: Function-as-Step Process Reward Model with Meta Reward Correction for Code Generation
Ruiyi Zhang, Peijia Qin, Qi Cao, Eric Xue, Pengtao Xie

TL;DR
FunPRM enhances code generation by organizing code into functions and applying a meta-learning reward correction, significantly improving performance and code quality across multiple large language models.
Contribution
Introduces FunPRM, a novel method that uses function-based step decomposition and meta reward correction to improve code generation with LLMs.
Findings
Outperforms existing test-time scaling methods on benchmark datasets.
Achieves state-of-the-art results on LiveCodeBench with O4-mini.
Produces more readable and reusable code.
Abstract
Code generation is a core application of large language models (LLMs), yet LLMs still frequently fail on complex programming tasks. Given its success in mathematical reasoning, test-time scaling approaches such as Process Reward Model (PRM)-based Best-of-N selection offer a promising way to improve performance. However, existing PRMs remain ineffective for code generation due to the lack of meaningful step decomposition in code and the noise of Monte Carlo-estimated partial-solution correctness scores (rewards). To address these challenges, we propose FunPRM. FunPRM prompts LLMs to encourage modular code generation organized into functions, with functions treated as PRM reasoning steps. Furthermore, FunPRM introduces a novel meta-learning-based reward correction mechanism that leverages clean final-solution rewards obtained via a unit-test-based evaluation system to purify noisy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Machine Learning and Data Classification
