Learning Finite-State Controllers for Partially Observable Environments
Nicolas Meuleau, Leonid Peshkin, Kee-Eung Kim, Leslie Pack Kaelbling

TL;DR
This paper introduces an extension of the VAPS algorithm to learn finite-state controllers for partially observable environments, enabling better memory utilization and convergence to locally optimal solutions.
Contribution
It extends the VAPS algorithm to learn general finite-state automata, providing a stochastic gradient descent approach for partially observable MDPs.
Findings
Stochastic gradient descent can outperform exact gradient descent under certain conditions.
The algorithm effectively extracts useful information from past observations.
Empirical results demonstrate improved control in partially observable environments.
Abstract
Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finite-state controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Machine Learning and Algorithms
