Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning
Oliver Dippel, Alexei Lisitsa, Bei Peng

TL;DR
This paper introduces Heuristic Transformer, an in-context reinforcement learning method that uses a belief distribution over rewards to improve decision-making, demonstrating superior performance across various environments.
Contribution
The paper proposes Heuristic Transformer, which incorporates a learned belief distribution over rewards into transformer-based reinforcement learning, enhancing decision accuracy and generalization.
Findings
HT outperforms baselines in Darkroom, Miniworld, and MuJoCo environments.
The belief augmentation improves decision-making effectiveness.
The approach bridges belief modeling and transformer decision-making.
Abstract
Transformers have demonstrated exceptional in-context learning (ICL) capabilities, enabling applications across natural language processing, computer vision, and sequential decision-making. In reinforcement learning, ICL reframes learning as a supervised problem, facilitating task adaptation without parameter updates. Building on prior work leveraging transformers for sequential decision-making, we propose Heuristic Transformer (HT), an in-context reinforcement learning (ICRL) approach that augments the in-context dataset with a belief distribution over rewards to achieve better decision-making. Using a variational auto-encoder (VAE), a low-dimensional stochastic variable is learned to represent the posterior distribution over rewards, which is incorporated alongside an in-context dataset and query states as prompt to the transformer policy. We assess the performance of HT across the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
