Finite-Time Analysis of Entropy-Regularized Neural Natural Actor-Critic Algorithm
Semih Cayci, Niao He, R. Srikant

TL;DR
This paper provides a finite-time theoretical analysis of neural natural actor-critic algorithms, highlighting the roles of entropy regularization, averaging, and neural network approximation in ensuring stability, exploration, and optimality in large state space MDPs.
Contribution
It offers the first finite-time analysis of neural NAC, revealing how regularization and optimization techniques contribute to sample efficiency and policy optimality.
Findings
Entropy regularization and averaging improve stability and exploration.
Regularization bounds the sample complexity and network width.
Uniform approximation power of neural networks is crucial for global optimality.
Abstract
Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces. In this paper, we present a finite-time analysis of NAC with neural network approximation, and identify the roles of neural networks, regularization and optimization techniques (e.g., gradient clipping and averaging) to achieve provably good performance in terms of sample complexity, iteration complexity and overparametrization bounds for the actor and the critic. In particular, we prove that (i) entropy regularization and averaging ensure stability by providing sufficient exploration to avoid near-deterministic and strictly suboptimal policies and (ii) regularization leads to sharp sample complexity and network width bounds in the regularized MDPs, yielding a favorable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Reinforcement Learning in Robotics
MethodsEntropy Regularization · Gradient Clipping
