Reinforcement Learning in Non-Markovian Environments

Siddharth Chandak; Pratik Shah; Vivek S Borkar; Parth Dodhia

arXiv:2211.01595·eess.SY·February 15, 2024

Reinforcement Learning in Non-Markovian Environments

Siddharth Chandak, Pratik Shah, Vivek S Borkar, Parth Dodhia

PDF

Open Access

TL;DR

This paper investigates reinforcement learning in non-Markovian environments, identifying errors caused by non-Markovian observations and proposing an autoencoder-based approach for agent design, with promising numerical results.

Contribution

It introduces a new formulation for RL in non-Markovian settings and develops an autoencoder-based method for approximating sufficient statistics.

Findings

01

Error quantification due to non-Markovianity

02

Autoencoder-based agent design scheme

03

Numerical validation on partially observed environments

Abstract

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical and Computational Modeling · Neural Networks and Applications · Reinforcement Learning in Robotics

MethodsQ-Learning