On the Sample Complexity and Metastability of Heavy-tailed Policy Search   in Continuous Control

Amrit Singh Bedi; Anjaly Parayil; Junyu Zhang; Mengdi Wang; Alec; Koppel

arXiv:2106.08414·cs.LG·January 4, 2023·1 cites

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec, Koppel

PDF

Open Access

TL;DR

This paper investigates how heavy-tailed policy parameterizations influence the convergence, stability, and performance of policy search in continuous control reinforcement learning, addressing challenges posed by non-convexity.

Contribution

It introduces a novel analysis linking tail index of policies to convergence rates, local maxima, and stability in continuous policy search.

Findings

01

Convergence rate depends on tail index alpha and other parameters.

02

Heavier tails lead to wider local maxima and improved stability.

03

Policy performance improves with heavier-tailed distributions, especially under misaligned incentives.

Abstract

Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and suitable parameterization, global optimality may be obtained. By contrast, in continuous space, the non-convexity poses a pathological challenge as evidenced by existing convergence results being mostly limited to stationarity or arbitrary local extrema. To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management