Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation
Zaiwei Chen, Sajad Khodadadian, Siva Theja Maguluri

TL;DR
This paper introduces a new off-policy natural actor-critic algorithm with linear function approximation, achieving improved sample complexity and finite-sample convergence bounds, addressing divergence issues in off-policy policy evaluation.
Contribution
It presents a novel off-policy natural actor-critic method with linear function approximation and establishes the first $ ilde{O}(rac{1}{ ext{epsilon}^3})$ sample complexity for such algorithms.
Findings
Achieves $ ilde{O}(rac{1}{ ext{epsilon}^3})$ sample complexity.
Develops a critic with $n$-step TD-learning for stability.
Provides finite-sample bounds under light exploration assumptions.
Abstract
In this paper, we develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation and we establish a sample complexity of , outperforming all the previously known convergence bounds of such algorithms. In order to overcome the divergence due to deadly triad in off-policy policy evaluation under function approximation, we develop a critic that employs -step TD-learning algorithm with a properly chosen . We present finite-sample convergence bounds on this critic under both constant and diminishing step sizes, which are of independent interest. Furthermore, we develop a variant of natural policy gradient under function approximation, with an improved convergence rate of after iterations. Combining the finite sample error bounds of actor and the critic, we obtain the …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advancements in Semiconductor Devices and Circuit Design · Ferroelectric and Negative Capacitance Devices
