Recurrent Natural Policy Gradient for POMDPs

Semih Cayci; Atilla Eryilmaz

arXiv:2405.18221·math.OC·October 20, 2025

Recurrent Natural Policy Gradient for POMDPs

Semih Cayci, Atilla Eryilmaz

PDF

Open Access

TL;DR

This paper introduces a recurrent natural policy gradient method for POMDPs that combines RNNs with natural gradient techniques, providing theoretical guarantees and analyzing limitations due to long-term dependencies.

Contribution

It develops a novel RNN-based natural policy gradient algorithm for POMDPs with theoretical analysis and insights into its limitations.

Findings

01

Provides non-asymptotic guarantees for the algorithm.

02

Characterizes cases where long-term dependencies hinder performance.

03

Demonstrates the method's efficiency in addressing non-stationarity in POMDPs.

Abstract

Solving partially observable Markov decision processes (POMDPs) remains a fundamental challenge in reinforcement learning (RL), primarily due to the curse of dimensionality induced by the non-stationarity of optimal policies. In this work, we study a natural actor-critic (NAC) algorithm that integrates recurrent neural network (RNN) architectures into a natural policy gradient (NPG) method and a temporal difference (TD) learning method. This framework leverages the representational capacity of RNNs to address non-stationarity in RL to solve POMDPs while retaining the statistical and computational efficiency of natural gradient methods in RL. We provide non-asymptotic theoretical guarantees for this method, including bounds on sample and iteration complexity to achieve global optimality up to function approximation. Additionally, we characterize pathological cases that stem from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptimization and Search Problems