Policy Gradient with Active Importance Sampling

Matteo Papini; Giorgio Manganini; Alberto Maria Metelli; Marcello; Restelli

arXiv:2405.05630·cs.LG·May 10, 2024

Policy Gradient with Active Importance Sampling

Matteo Papini, Giorgio Manganini, Alberto Maria Metelli, Marcello, Restelli

PDF

Open Access

TL;DR

This paper introduces an active importance sampling approach for policy gradient methods in reinforcement learning, optimizing behavioral policies to minimize variance and improve learning efficiency.

Contribution

It proposes an iterative algorithm that optimizes behavioral policies for variance reduction using defensive importance sampling, with theoretical convergence analysis and practical validation.

Findings

01

Reduced policy gradient variance compared to standard methods

02

Faster learning speed in reinforcement learning tasks

03

Theoretical convergence rate of the proposed algorithm

Abstract

Importance sampling (IS) represents a fundamental technique for a large surge of off-policy reinforcement learning approaches. Policy gradient (PG) methods, in particular, significantly benefit from IS, enabling the effective reuse of previously collected samples, thus increasing sample efficiency. However, classically, IS is employed in RL as a passive tool for re-weighting historical samples. However, the statistical community employs IS as an active tool combined with the use of behavioral distributions that allow the reduction of the estimate variance even below the sample mean one. In this paper, we focus on this second setting by addressing the behavioral policy optimization (BPO) problem. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance as much as possible. We provide an iterative algorithm that alternates between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Simulation Techniques and Applications

MethodsFocus