Vintix: Action Model via In-Context Reinforcement Learning
Andrey Polubarov, Nikita Lyubaykin, Alexander Derevyagin, Ilya Zisman, Denis Tarasov, Alexander Nikulin, Vladislav Kurenkov

TL;DR
Vintix introduces a scalable in-context reinforcement learning framework that enables a fixed, cross-domain model to learn behaviors through trial-and-error, advancing the development of generalist decision-making agents.
Contribution
This work presents the first scalable ICRL approach using Algorithm Distillation to create versatile action models across multiple domains.
Findings
Algorithm Distillation outperforms expert distillation in creating action models.
The fixed model effectively learns behaviors across diverse tasks.
Results demonstrate ICRL's potential for scalable generalist agents.
Abstract
In-Context Reinforcement Learning (ICRL) represents a promising paradigm for developing generalist agents that learn at inference time through trial-and-error interactions, analogous to how large language models adapt contextually, but with a focus on reward maximization. However, the scalability of ICRL beyond toy tasks and single-domain settings remains an open challenge. In this work, we present the first steps toward scaling ICRL by introducing a fixed, cross-domain model capable of learning behaviors through in-context reinforcement learning. Our results demonstrate that Algorithm Distillation, a framework designed to facilitate ICRL, offers a compelling and competitive alternative to expert distillation to construct versatile action models. These findings highlight the potential of ICRL as a scalable approach for generalist decision-making systems. Code released at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMental Health Research Topics
MethodsFocus
