Functional Acceleration for Policy Mirror Descent
Veronica Chelu, Doina Precup

TL;DR
This paper introduces a functional acceleration technique for Policy Mirror Descent in Reinforcement Learning, enabling large-scale, parametrization-independent optimization with theoretical analysis and numerical validation.
Contribution
It proposes a duality-based, momentum-enhanced PMD update that is independent of policy parametrization, extending the applicability of momentum methods in RL.
Findings
The approach is applicable to large-scale optimization.
Theoretical properties of the method are established.
Numerical studies illustrate policy optimization dynamics.
Abstract
We apply functional acceleration to the Policy Mirror Descent (PMD) general family of algorithms, which cover a wide range of novel and fundamental methods in Reinforcement Learning (RL). Leveraging duality, we propose a momentum-based PMD update. By taking the functional route, our approach is independent of the policy parametrization and applicable to large-scale optimization, covering previous applications of momentum at the level of policy parameters as a special case. We theoretically analyze several properties of this approach and complement with a numerical ablation study, which serves to illustrate the policy optimization dynamics on the value polytope, relative to different algorithmic design choices in this space. We further characterize numerically several features of the problem setting relevant for functional acceleration, and lastly, we investigate the impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management
