The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Yang Liu; Enxi Wang; Yufei Gao; Weixin Zhang; Bo Wang; Zhiyuan Zeng; Yikai Zhang; Yining Zheng; Xipeng Qiu

arXiv:2604.11297·cs.LG·April 14, 2026

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Yang Liu, Enxi Wang, Yufei Gao, Weixin Zhang, Bo Wang, Zhiyuan Zeng, Yikai Zhang, Yining Zheng, Xipeng Qiu

PDF

1 Repo

TL;DR

This paper introduces MEDS, a memory-enhanced reward shaping method that uses historical behavioral data to penalize recurring errors, thereby improving diversity and performance in reinforcement learning for language models.

Contribution

MEDS is a novel framework that incorporates past behavioral signals into reward design to reduce repeated mistakes and enhance exploration in language model training.

Findings

01

MEDS improves performance by up to 4.13 pass@1 points.

02

MEDS increases behavioral diversity during sampling.

03

Consistent gains across five datasets and three models.

Abstract

Despite the success of reinforcement learning for large language models, a common failure mode is reduced sampling diversity, where the policy repeatedly generates similar erroneous behaviors. Classical entropy regularization encourages randomness under the current policy, but does not explicitly discourage recurrent failure patterns across rollouts. We propose MEDS, a Memory-Enhanced Dynamic reward Shaping framework that incorporates historical behavioral signals into reward design. By storing and leveraging intermediate model representations, we capture features of past rollouts and use density-based clustering to identify frequently recurring error patterns. Rollouts assigned to more prevalent error clusters are penalized more heavily, encouraging broader exploration while reducing repeated mistakes. Across five datasets and three base models, MEDS consistently improves average…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linxi000/MEDS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.