Random Policy Enables In-Context Reinforcement Learning within Trust Horizons
Weiqin Chen, Santiago Paternain

TL;DR
This paper introduces State-Action Distillation (SAD), a novel method enabling in-context reinforcement learning using only random policies for pretraining, significantly broadening applicability in real-world scenarios.
Contribution
SAD is the first approach to facilitate effective in-context RL with random policies, removing the need for optimal or well-trained policies during pretraining.
Findings
SAD outperforms baselines by 236.3% offline
SAD outperforms baselines by 135.2% online
Empirical validation across multiple benchmarks
Abstract
Pretrained foundation models have exhibited extraordinary in-context learning performance, allowing zero-shot generalization to new tasks not encountered during pretraining. In the case of reinforcement learning (RL), in-context RL (ICRL) emerges when pretraining FMs on decision-making problems in an autoregressive-supervised manner. Nevertheless, current state-of-the-art ICRL algorithms, like Algorithm Distillation, Decision Pretrained Transformer and Decision Importance Transformer, impose stringent requirements on the pretraining dataset concerning the source policies, context information, and action labels. Notably, these algorithms either demand optimal policies or require varying degrees of well-trained behavior policies for all pretraining environments. This significantly hinders the application of ICRL to real-world scenarios, where acquiring optimal or well-trained policies for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management
MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · Absolute Position Encodings · Adam · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · Softmax · Dense Connections
