Understanding the Effect of Stochasticity in Policy Optimization

Jincheng Mei; Bo Dai; Chenjun Xiao; Csaba Szepesvari; Dale Schuurmans

arXiv:2110.15572·cs.LG·November 1, 2021

Understanding the Effect of Stochasticity in Policy Optimization

Jincheng Mei, Bo Dai, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

PDF

Open Access 1 Video

TL;DR

This paper investigates how stochasticity influences on-policy policy optimization, revealing fundamental trade-offs and proposing an ensemble method to reliably find near-optimal policies.

Contribution

It introduces the concept of committal rate, analyzes the trade-offs in stochastic policy optimization, and develops an ensemble method for high-probability near-optimal solutions.

Findings

01

Stochastic gradients limit geometric acceleration in policy optimization.

02

A trade-off exists between convergence speed and global optimality without external information.

03

The proposed ensemble method guarantees near-optimal solutions with high probability.

Abstract

We study the effect of stochasticity in on-policy policy optimization, and make the following four contributions. First, we show that the preferability of optimization methods depends critically on whether stochastic versus exact gradients are used. In particular, unlike the true gradient setting, geometric information cannot be easily exploited in the stochastic case for accelerating policy optimization without detrimental consequences or impractical assumptions. Second, to explain these findings we introduce the concept of committal rate for stochastic policy optimization, and show that this can serve as a criterion for determining almost sure convergence to global optimality. Third, we show that in the absence of external oracle information, which allows an algorithm to determine the difference between optimal and sub-optimal actions given only on-policy samples, there is an inherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding the Effect of Stochasticity in Policy Optimization· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research