Learning Policies with External Memory

Leonid Peshkin; Nicolas Meuleau; Leslie Kaelbling

arXiv:cs/0103003·cs.LG·May 23, 2007·92 cites

Learning Policies with External Memory

Leonid Peshkin, Nicolas Meuleau, Leslie Kaelbling

PDF

Open Access

TL;DR

This paper investigates learning reactive policies in partially observable environments using external memory manipulation, comparing SARSA(λ) and a new algorithm VAPS with convergence guarantees.

Contribution

It introduces a stigmergic approach with external memory for partially observable domains and compares two algorithms, including a new one with convergence guarantees.

Findings

01

VAPS shows competitive performance on benchmark problems.

02

External memory enables better handling of partial observability.

03

Comparison highlights strengths and limitations of SARSA(λ) and VAPS.

Abstract

In order for an agent to perform well in partially observable domains, it is usually necessary for actions to depend on the history of observations. In this paper, we explore a {\it stigmergic} approach, in which the agent's actions include the ability to set and clear bits in an external memory, and the external memory is included as part of the input to the agent. In this case, we need to learn a reactive policy in a highly non-Markovian domain. We explore two algorithms: SARSA(\lambda), which has had empirical success in partially observable domains, and VAPS, a new algorithm due to Baird and Moore, with convergence guarantees in partially observable domains. We compare the performance of these two algorithms on benchmark problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques