Learning to Focus: Prioritizing Informative Histories with Structured Attention Mechanisms in Partially Observable Reinforcement Learning
Daniel De Dios Allegue, Jinke He, Frans A. Oliehoek

TL;DR
This paper introduces structured attention priors into Transformer-based world models for partially observable RL, significantly improving data efficiency by better prioritizing informative past transitions.
Contribution
It proposes Gaussian and memory-length priors in self-attention, with Gaussian priors notably enhancing performance in partial observability scenarios.
Findings
Gaussian prior yields 77% relative score improvement.
Memory-length priors often truncate useful signals.
Structured priors improve data efficiency in RL.
Abstract
Transformers have shown strong ability to model long-term dependencies and are increasingly adopted as world models in model-based reinforcement learning (RL) under partial observability. However, unlike natural language corpora, RL trajectories are sparse and reward-driven, making standard self-attention inefficient because it distributes weight uniformly across all past tokens rather than emphasizing the few transitions critical for control. To address this, we introduce structured inductive priors into the self-attention mechanism of the dynamics head: (i) per-head memory-length priors that constrain attention to task-specific windows, and (ii) distributional priors that learn smooth Gaussian weightings over past state-action pairs. We integrate these mechanisms into UniZero, a model-based RL agent with a Transformer-based world model that supports planning under partial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Social Robot Interaction and HRI · Embodied and Extended Cognition
