Periodic agent-state based Q-learning for POMDPs

Amit Sinha; Matthieu Geist; Aditya Mahajan

arXiv:2407.06121·cs.LG·October 30, 2024

Periodic agent-state based Q-learning for POMDPs

Amit Sinha, Matthieu Geist, Aditya Mahajan

PDF

Open Access 1 Video

TL;DR

This paper introduces PASQL, a novel periodic agent-state based Q-learning algorithm for POMDPs, which learns cyclic policies and outperforms stationary policies by leveraging non-Markovian agent states.

Contribution

The paper proposes PASQL, a new RL algorithm for POMDPs that learns periodic policies using agent states, with theoretical convergence guarantees and empirical validation.

Findings

01

PASQL converges to cyclic policies.

02

Periodic policies outperform stationary ones in POMDPs.

03

Numerical experiments demonstrate the benefits of PASQL.

Abstract

The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Periodic agent-state based Q-learning for POMDPs· slideslive

Taxonomy

TopicsNeural Networks and Reservoir Computing · Elevator Systems and Control · Neural Networks and Applications

MethodsQ-Learning