Recurrent Natural Policy Gradient for POMDPs
Semih Cayci, Atilla Eryilmaz

TL;DR
This paper introduces a recurrent natural policy gradient method for POMDPs that combines RNNs with natural gradient techniques, providing theoretical guarantees and analyzing limitations due to long-term dependencies.
Contribution
It develops a novel RNN-based natural policy gradient algorithm for POMDPs with theoretical analysis and insights into its limitations.
Findings
Provides non-asymptotic guarantees for the algorithm.
Characterizes cases where long-term dependencies hinder performance.
Demonstrates the method's efficiency in addressing non-stationarity in POMDPs.
Abstract
Solving partially observable Markov decision processes (POMDPs) remains a fundamental challenge in reinforcement learning (RL), primarily due to the curse of dimensionality induced by the non-stationarity of optimal policies. In this work, we study a natural actor-critic (NAC) algorithm that integrates recurrent neural network (RNN) architectures into a natural policy gradient (NPG) method and a temporal difference (TD) learning method. This framework leverages the representational capacity of RNNs to address non-stationarity in RL to solve POMDPs while retaining the statistical and computational efficiency of natural gradient methods in RL. We provide non-asymptotic theoretical guarantees for this method, including bounds on sample and iteration complexity to achieve global optimality up to function approximation. Additionally, we characterize pathological cases that stem from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Search Problems
