On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control
Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec, Koppel

TL;DR
This paper investigates how heavy-tailed policy parameterizations influence the convergence, stability, and performance of policy search in continuous control reinforcement learning, addressing challenges posed by non-convexity.
Contribution
It introduces a novel analysis linking tail index of policies to convergence rates, local maxima, and stability in continuous policy search.
Findings
Convergence rate depends on tail index alpha and other parameters.
Heavier tails lead to wider local maxima and improved stability.
Policy performance improves with heavier-tailed distributions, especially under misaligned incentives.
Abstract
Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and suitable parameterization, global optimality may be obtained. By contrast, in continuous space, the non-convexity poses a pathological challenge as evidenced by existing convergence results being mostly limited to stationarity or arbitrary local extrema. To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management
