On the Importance of Multistability for Horizon Generalization in Reinforcement Learning
Asad Bakija, Florent De Geeter, Julien Brandoit, Pierre Sacr\'e, Guillaume Drion

TL;DR
This paper investigates the role of multistability in RNNs for enabling reinforcement learning agents to generalize across different temporal horizons, highlighting the importance of dynamical regimes for long-term decision making.
Contribution
It formalizes temporal horizon generalization, establishes multistability as necessary and sufficient in simple tasks, and analyzes why current architectures fail to generalize across horizons.
Findings
Multistability is necessary for horizon generalization.
Modern parallelizable RNNs are monostable and fail to generalize.
Transient dynamics are crucial for complex tasks.
Abstract
In reinforcement learning (RL), agents acting in partially observable Markov decision processes (POMDPs) must rely on memory, typically encoded in a recurrent neural network (RNN), to integrate information from past observations. Long-horizon POMDPs, in which the relevant observation and the optimal action are separated by many time steps (called the horizon), are particularly challenging: training suffers from poor generalization, severe sample inefficiency, and prohibitive exploration costs. Ideally, an agent trained on short horizons would retain optimal behavior at arbitrarily longer ones, but no formal framework currently characterizes when this is achievable. To fill this gap, we formalized temporal horizon generalization, the property that a policy remains optimal for all horizons, derived a necessary and sufficient condition for it, and experimentally evaluated the ability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
