Test-time Offline Reinforcement Learning on Goal-related Experience
Marco Bagatella, Mert Albaba, Jonas H\"ubotter, Georg Martius, Andreas Krause

TL;DR
This paper introduces GC-TTT, a test-time offline reinforcement learning method that adapts policies to current goals using a novel data selection criterion, improving performance in high-dimensional tasks with modest compute costs.
Contribution
It proposes a goal-conditioned test-time training algorithm that fine-tunes policies during evaluation using relevant experience, enhancing performance over standard offline methods.
Findings
GC-TTT improves policy performance across diverse tasks.
Selective data based on relevance enhances fine-tuning effectiveness.
Performance gains are achieved without increasing model size, just by better compute allocation.
Abstract
Foundation models compress a large amount of information in a single, large neural network, which can then be queried for individual tasks. There are strong parallels between this widespread framework and offline goal-conditioned reinforcement learning algorithms: a universal value function is trained on a large number of goals, and the policy is evaluated on a single goal in each test episode. Extensive research in foundation models has shown that performance can be substantially improved through test-time training, specializing the model to the current goal. We find similarly that test-time offline reinforcement learning on experience related to the test goal can lead to substantially better policies at modest compute costs. We propose a novel self-supervised data selection criterion, which selects transitions from an offline dataset according to their relevance to the current state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
