Random Policy Enables In-Context Reinforcement Learning within Trust   Horizons

Weiqin Chen; Santiago Paternain

arXiv:2410.19982·cs.LG·May 5, 2025

Random Policy Enables In-Context Reinforcement Learning within Trust Horizons

Weiqin Chen, Santiago Paternain

PDF

Open Access

TL;DR

This paper introduces State-Action Distillation (SAD), a novel method enabling in-context reinforcement learning using only random policies for pretraining, significantly broadening applicability in real-world scenarios.

Contribution

SAD is the first approach to facilitate effective in-context RL with random policies, removing the need for optimal or well-trained policies during pretraining.

Findings

01

SAD outperforms baselines by 236.3% offline

02

SAD outperforms baselines by 135.2% online

03

Empirical validation across multiple benchmarks

Abstract

Pretrained foundation models have exhibited extraordinary in-context learning performance, allowing zero-shot generalization to new tasks not encountered during pretraining. In the case of reinforcement learning (RL), in-context RL (ICRL) emerges when pretraining FMs on decision-making problems in an autoregressive-supervised manner. Nevertheless, current state-of-the-art ICRL algorithms, like Algorithm Distillation, Decision Pretrained Transformer and Decision Importance Transformer, impose stringent requirements on the pretraining dataset concerning the source policies, context information, and action labels. Notably, these algorithms either demand optimal policies or require varying degrees of well-trained behavior policies for all pretraining environments. This significantly hinders the application of ICRL to real-world scenarios, where acquiring optimal or well-trained policies for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management

MethodsSix Ways To Communicate To Someone At Expedia Via Phone And Email's. · Absolute Position Encodings · Adam · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · Softmax · Dense Connections