Reinforcement Learning in Non-Markovian Environments
Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

TL;DR
This paper investigates reinforcement learning in non-Markovian environments, identifying errors caused by non-Markovian observations and proposing an autoencoder-based approach for agent design, with promising numerical results.
Contribution
It introduces a new formulation for RL in non-Markovian settings and develops an autoencoder-based method for approximating sufficient statistics.
Findings
Error quantification due to non-Markovianity
Autoencoder-based agent design scheme
Numerical validation on partially observed environments
Abstract
Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and Computational Modeling · Neural Networks and Applications · Reinforcement Learning in Robotics
MethodsQ-Learning
