Actor-Critic with Active Importance Sampling
Majid Molaei, Gabor Paczolay, Matteo Papini, Alberto Maria Metelli, Marcello Restelli

TL;DR
The paper presents AISAC, an Actor-Critic extension that reduces variance in policy gradient estimates by optimizing behavior policies through importance sampling, leading to faster and more stable reinforcement learning.
Contribution
AISAC introduces a novel method to optimize behavior policies for variance reduction in policy gradients within Actor-Critic frameworks, enhancing learning efficiency.
Findings
AISAC reduces variance and improves sample efficiency.
Experiments show faster convergence and increased stability.
Method outperforms standard Actor-Critic in continuous control tasks.
Abstract
This paper introduces the Active-Importance-Sampling Actor-Critic (AISAC) algorithm, an extension of the Actor-Critic framework for reducing variance in policy gradient estimation. AISAC optimizes the behavior policy to minimize gradient variance while preserving unbiased gradient estimates. Using importance sampling principles, the algorithm adapts the behavior policy toward efficient data collection distributions aligned with target policy gradients. For continuous action spaces, AISAC employs Gaussian behavior policies optimized through cross-entropy minimization. We provide theoretical analysis demonstrating variance reduction and unbiasedness. Experiments on Inverted Pendulum and Half Cheetah tasks show improved learning speed, sample efficiency, and training stability compared to standard Actor-Critic methods. Results indicate that optimizing the behavior policy improves both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
