How memory architecture affects learning in a simple POMDP: the   two-hypothesis testing problem

Mario Geiger; Christophe Eloy; Matthieu Wyart

arXiv:2106.08849·cs.LG·November 19, 2021

How memory architecture affects learning in a simple POMDP: the two-hypothesis testing problem

Mario Geiger, Christophe Eloy, Matthieu Wyart

PDF

Open Access 1 Repo

TL;DR

This paper investigates how different memory architectures influence learning efficiency in a simple POMDP, demonstrating that constrained memory can improve training outcomes despite potential sacrifices in optimality.

Contribution

The study compares flexible and fixed memory architectures in a POMDP, showing constrained memory can achieve near-optimal performance and enhance training success.

Findings

01

Constrained memory architectures can match the performance of more flexible ones in a POMDP.

02

Training from random initialization is significantly more successful with fixed memory architectures.

03

Performance probability decreases exponentially with memory size, following a specific mathematical relation.

Abstract

Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance for random initialization. The performance can be empirically improved by constraining the memory architecture, then sacrificing optimality to facilitate training. Here we study this trade-off in a two-hypothesis testing problem, akin to the two-arm bandit problem. We compare two extreme cases: (i) the random access memory where any transitions between $M$ memory states are allowed and (ii) a fixed memory where the agent can access its last $m$ actions and rewards. For (i), the probability $q$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pcsl-epfl/bandit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms