Reusing Trajectories in Policy Gradients Enables Fast Convergence
Alessandro Montenegro, Federico Mansutti, Marco Mussi, Matteo Papini, Alberto Maria Metelli

TL;DR
This paper introduces RT-PG, a new policy gradient algorithm that reuses past trajectories with importance weighting, significantly improving convergence rates and sample efficiency in reinforcement learning.
Contribution
The paper provides the first theoretical analysis showing that reusing past trajectories accelerates policy gradient convergence, achieving the best known rates.
Findings
RT-PG achieves a sample complexity of O(\u03b5^{-2}\u03c9^{-1})
Reusing all past trajectories yields an O() convergence rate
Empirical results confirm the effectiveness of trajectory reuse in practice.
Abstract
Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring trajectories to reach an -approximate stationary point. A common strategy to improve efficiency is to reuse information from past iterations, such as previous gradients or trajectories, leading to off-policy PG methods. While gradient reuse has received substantial attention, leading to improved rates up to , the reuse of past trajectories, although intuitive, remains largely unexplored from a theoretical perspective. In this work, we provide the first rigorous theoretical evidence that reusing past off-policy trajectories can significantly accelerate PG convergence. We propose RT-PG (Reusing Trajectories - Policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research
