Periodic agent-state based Q-learning for POMDPs
Amit Sinha, Matthieu Geist, Aditya Mahajan

TL;DR
This paper introduces PASQL, a novel periodic agent-state based Q-learning algorithm for POMDPs, which learns cyclic policies and outperforms stationary policies by leveraging non-Markovian agent states.
Contribution
The paper proposes PASQL, a new RL algorithm for POMDPs that learns periodic policies using agent states, with theoretical convergence guarantees and empirical validation.
Findings
PASQL converges to cyclic policies.
Periodic policies outperform stationary ones in POMDPs.
Numerical experiments demonstrate the benefits of PASQL.
Abstract
The standard approach for Partially Observable Markov Decision Processes (POMDPs) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternative is to use an agent state, which is a model-free, recursively updateable function of the observation history. Examples include frame stacking and recurrent neural networks. Since the agent state is model-free, it is used to adapt standard RL algorithms to POMDPs. However, standard RL algorithms like Q-learning learn a stationary policy. Our main thesis that we illustrate via examples is that because the agent state does not satisfy the Markov property, non-stationary agent-state based policies can outperform stationary ones. To leverage this feature, we propose PASQL (periodic agent-state based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Reservoir Computing · Elevator Systems and Control · Neural Networks and Applications
MethodsQ-Learning
