On learning history based policies for controlling Markov decision processes
Gandharv Patil, Aditya Mahajan, Doina Precup

TL;DR
This paper develops a theoretical framework for analyzing history-based reinforcement learning algorithms controlling MDPs, introduces a practical algorithm, and evaluates its performance on continuous control tasks.
Contribution
It provides the first formal analysis of history-based RL algorithms for MDP control and proposes a new practical algorithm based on this framework.
Findings
The proposed algorithm performs well on continuous control tasks.
History-based feature abstraction improves RL control in MDPs.
The framework offers insights into the behavior of history-based RL methods.
Abstract
Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Neural Networks and Applications
