Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration
Chentian Jiang, Nan Rosemary Ke, Hado van Hasselt

TL;DR
This paper introduces a transformer-based method to learn inference over partial MDPs for in-context adaptation and exploration, achieving near-oracle performance without costly Bayesian inference.
Contribution
It proposes a novel approach that learns to infer partial models from training tasks, enabling efficient in-context adaptation and exploration without gradient updates.
Findings
Approaches oracle-level adaptation speed and exploration-exploitation balance.
Partial models can still produce effective policies despite missing information.
Method outperforms traditional Bayesian inference in efficiency and effectiveness.
Abstract
To generalize across tasks, an agent should acquire knowledge from past tasks that facilitate adaptation and exploration in future tasks. We focus on the problem of in-context adaptation and exploration, where an agent only relies on context, i.e., history of states, actions and/or rewards, rather than gradient-based updates. Posterior sampling (extension of Thompson sampling) is a promising approach, but it requires Bayesian inference and dynamic programming, which often involve unknowns (e.g., a prior) and costly computations. To address these difficulties, we use a transformer to learn an inference process from training tasks and consider a hypothesis space of partial models, represented as small Markov decision processes that are cheap for dynamic programming. In our version of the Symbolic Alchemy benchmark, our method's adaptation speed and exploration-exploitation balance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Data Stream Mining Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
