Freshness-Aware Prioritized Experience Replay for LLM/VLM Reinforcement Learning

Weiyu Ma; Yongcheng Zeng; Yan Song; Xinyu Cui; Jian Zhao; Xuhui Liu; Mohamed Elhoseiny

arXiv:2604.16918·cs.CL·April 21, 2026

Freshness-Aware Prioritized Experience Replay for LLM/VLM Reinforcement Learning

Weiyu Ma, Yongcheng Zeng, Yan Song, Xinyu Cui, Jian Zhao, Xuhui Liu, Mohamed Elhoseiny

PDF

1 Repo

TL;DR

This paper introduces Freshness-Aware Prioritized Experience Replay, a novel method that enhances sample efficiency in reinforcement learning for large language and vision-language models by addressing priority staleness.

Contribution

It proposes a new age decay mechanism for PER, enabling effective application to LLM/VLM RL, and demonstrates significant performance improvements across multiple tasks.

Findings

01

Achieved up to +367% improvement on Sokoban.

02

Standard PER degrades performance without age decay.

03

First successful application of PER to LLM/VLM RL.

Abstract

Reinforcement Learning (RL) has achieved impressive success in post-training Large Language Models (LLMs) and Vision-Language Models (VLMs), with on-policy algorithms such as PPO, GRPO, and REINFORCE++ serving as the dominant paradigm. However, these methods discard all collected trajectories after a single gradient update, resulting in poor sample efficiency, particularly wasteful for agentic tasks where multi-turn environment interactions are expensive. While Experience Replay drives sample efficiency in classic RL by allowing agents to reuse past trajectories and prioritize informative ones, directly applying Prioritized Experience Replay (PER) to LLMs fails. The rapid policy evolution of billion-parameter models renders stored priorities stale, causing old high-priority trajectories to dominate sampling long after they have become uninformative. We propose Freshness-Aware PER, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Vision-CAIR/Freshness-Aware-PER
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.