Learning Policies with External Memory
Leonid Peshkin, Nicolas Meuleau, Leslie Kaelbling

TL;DR
This paper investigates learning reactive policies in partially observable environments using external memory manipulation, comparing SARSA(λ) and a new algorithm VAPS with convergence guarantees.
Contribution
It introduces a stigmergic approach with external memory for partially observable domains and compares two algorithms, including a new one with convergence guarantees.
Findings
VAPS shows competitive performance on benchmark problems.
External memory enables better handling of partial observability.
Comparison highlights strengths and limitations of SARSA(λ) and VAPS.
Abstract
In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a {\it stigmergic} approach, in which the agent's actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent. In this case, we need to learn a reactive policy in a highly non-Markovian domain. We explore two algorithms: SARSA(\lambda), which has had empirical success in partially observable domains, and VAPS, a new algorithm due to Baird and Moore, with convergence guarantees in partially observable domains. We compare the performance of these two algorithms on benchmark problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques
