Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm   with General Parameterization for Infinite Horizon Discounted Reward Markov   Decision Processes

Washim Uddin Mondal; Vaneet Aggarwal

arXiv:2310.11677·cs.LG·February 6, 2024·1 cites

Improved Sample Complexity Analysis of Natural Policy Gradient Algorithm with General Parameterization for Infinite Horizon Discounted Reward Markov Decision Processes

Washim Uddin Mondal, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces ANPG, an accelerated natural policy gradient algorithm that improves sample complexity for infinite horizon discounted reward MDPs without requiring variance bounds on importance sampling weights.

Contribution

The paper proposes ANPG, a first-order accelerated policy gradient method that achieves better sample complexity bounds without unverifiable assumptions, advancing theoretical understanding in reinforcement learning.

Findings

01

Achieves $oxed{ ext{O}(rac{1}{oxed{ ext{ε}^2})})}$ sample complexity.

02

Improves previous bounds by a $oxed{ ext{log}(rac{1}{ ext{ε}})}$ factor.

03

Matches state-of-the-art iteration complexity without variance assumptions.

Abstract

We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $O (ϵ^{- 2})$ sample complexity and $O (ϵ^{- 1})$ iteration complexity with general parameterization where $ϵ$ defines the optimality error. This improves the state-of-the-art sample complexity by a $lo g (\frac{1}{ϵ})$ factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics