How memory architecture affects learning in a simple POMDP: the two-hypothesis testing problem
Mario Geiger, Christophe Eloy, Matthieu Wyart

TL;DR
This paper investigates how different memory architectures influence learning efficiency in a simple POMDP, demonstrating that constrained memory can improve training outcomes despite potential sacrifices in optimality.
Contribution
The study compares flexible and fixed memory architectures in a POMDP, showing constrained memory can achieve near-optimal performance and enhance training success.
Findings
Constrained memory architectures can match the performance of more flexible ones in a POMDP.
Training from random initialization is significantly more successful with fixed memory architectures.
Performance probability decreases exponentially with memory size, following a specific mathematical relation.
Abstract
Reinforcement learning is generally difficult for partially observable Markov decision processes (POMDPs), which occurs when the agent's observation is partial or noisy. To seek good performance in POMDPs, one strategy is to endow the agent with a finite memory, whose update is governed by the policy. However, policy optimization is non-convex in that case and can lead to poor training performance for random initialization. The performance can be empirically improved by constraining the memory architecture, then sacrificing optimality to facilitate training. Here we study this trade-off in a two-hypothesis testing problem, akin to the two-arm bandit problem. We compare two extreme cases: (i) the random access memory where any transitions between memory states are allowed and (ii) a fixed memory where the agent can access its last actions and rewards. For (i), the probability …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
