Reusing Trajectories in Policy Gradients Enables Fast Convergence

Alessandro Montenegro; Federico Mansutti; Marco Mussi; Matteo Papini; Alberto Maria Metelli

arXiv:2506.06178·cs.LG·February 3, 2026

Reusing Trajectories in Policy Gradients Enables Fast Convergence

Alessandro Montenegro, Federico Mansutti, Marco Mussi, Matteo Papini, Alberto Maria Metelli

PDF

Open Access

TL;DR

This paper introduces RT-PG, a new policy gradient algorithm that reuses past trajectories with importance weighting, significantly improving convergence rates and sample efficiency in reinforcement learning.

Contribution

The paper provides the first theoretical analysis showing that reusing past trajectories accelerates policy gradient convergence, achieving the best known rates.

Findings

01

RT-PG achieves a sample complexity of O(\u03b5^{-2}\u03c9^{-1})

02

Reusing all past trajectories yields an O() convergence rate

03

Empirical results confirm the effectiveness of trajectory reuse in practice.

Abstract

Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring $O (ϵ^{- 2})$ trajectories to reach an $ϵ$ -approximate stationary point. A common strategy to improve efficiency is to reuse information from past iterations, such as previous gradients or trajectories, leading to off-policy PG methods. While gradient reuse has received substantial attention, leading to improved rates up to $O (ϵ^{- 3/2})$ , the reuse of past trajectories, although intuitive, remains largely unexplored from a theoretical perspective. In this work, we provide the first rigorous theoretical evidence that reusing past off-policy trajectories can significantly accelerate PG convergence. We propose RT-PG (Reusing Trajectories - Policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research