On Many-Actions Policy Gradient

Michal Nauman; Marek Cygan

arXiv:2210.13011·cs.LG·November 1, 2023

On Many-Actions Policy Gradient

Michal Nauman, Marek Cygan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper analyzes the variance of stochastic policy gradients with multiple action samples, proposing a model-based approach that improves sample efficiency and performance in continuous action environments.

Contribution

It introduces MBMA, a novel model-based method for many-actions sampling in policy gradients, reducing bias and enhancing efficiency.

Findings

01

MBMA achieves lower bias and comparable variance to traditional SPG.

02

MBMA improves sample efficiency and returns in continuous action tasks.

03

Theoretical variance and bias structures match empirical results.

Abstract

We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

papersubmissions-anon/daa-ppo
pytorchOfficial

Videos

On Many-Actions Policy Gradient· slideslive

Taxonomy

TopicsSimulation Techniques and Applications · Reinforcement Learning in Robotics · Age of Information Optimization