Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes
Washim Uddin Mondal, Vaneet Aggarwal

TL;DR
This paper introduces ANPG, an accelerated natural policy gradient algorithm that improves sample complexity for infinite horizon discounted reward MDPs without requiring variance bounds on importance sampling weights.
Contribution
The paper proposes ANPG, a first-order accelerated policy gradient method that achieves better sample complexity bounds without unverifiable assumptions, advancing theoretical understanding in reinforcement learning.
Findings
Achieves $oxed{ ext{O}(rac{1}{oxed{ ext{ε}^2})})}$ sample complexity.
Improves previous bounds by a $oxed{ ext{log}(rac{1}{ ext{ε}})}$ factor.
Matches state-of-the-art iteration complexity without variance assumptions.
Abstract
We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves sample complexity and iteration complexity with general parameterization where defines the optimality error. This improves the state-of-the-art sample complexity by a factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics
