MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
Tianyang Luo, Tao Feng, Zhigang Hua, Yan Xie, Shuang Yang, Ge Liu, Jiaxuan You

TL;DR
MemReward introduces a graph-based experience memory that propagates reward signals across similar rollouts, enabling effective reinforcement learning fine-tuning of large language models with limited labeled data.
Contribution
The paper presents a novel graph neural network framework that propagates rewards from labeled to unlabeled rollouts, reducing the need for extensive ground-truth labels in LLM reinforcement learning.
Findings
MemReward achieves over 96% of Oracle performance with only 20% labeled rollouts.
The framework effectively propagates rewards in mathematics, QA, and code generation tasks.
Approaches near Oracle performance on out-of-domain tasks.
Abstract
Reinforcement learning has emerged as a powerful paradigm for improving large language model (LLM) reasoning, where rollouts are sampled from the policy and reward signals computed on those rollouts are used to update the policy. However, in data-scarce scenarios, obtaining ground-truth labels to verify rollouts at scale often requires expensive human annotation or labor-intensive expert verification. For instance, evaluating mathematical proofs demands expert review, and open-ended question answering lacks definitive ground truth. When ground-truth labels are scarce, the effectiveness of reinforcement learning fine-tuning is constrained. Inspired by the success of semi-supervised learning in propagating labels from labeled to unlabeled samples, we propose MemReward, a graph-based experience memory framework that integrates reward propagation directly into online policy optimization.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
