Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen; Shinji Ito; Masaaki Imaizumi

arXiv:2508.16027·stat.ML·October 24, 2025

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

PDF

1 Video

TL;DR

This paper demonstrates that transformers can effectively handle non-stationary reinforcement learning environments, achieving near-optimal dynamic regret bounds and outperforming existing algorithms through theoretical analysis and empirical validation.

Contribution

It provides the first theoretical proof that transformers can approximate strategies for non-stationary RL and learns these strategies in-context, with empirical results supporting their effectiveness.

Findings

01

Transformers achieve nearly optimal dynamic regret bounds in non-stationary RL.

02

Transformers can learn strategies for non-stationary environments in-context.

03

Transformers match or outperform existing expert algorithms in experiments.

Abstract

Transformers have demonstrated exceptional performance across a wide range of domains. While their ability to perform reinforcement learning in-context has been established both theoretically and empirically, their behavior in non-stationary environments remains less understood. In this study, we address this gap by showing that transformers can achieve nearly optimal dynamic regret bounds in non-stationary settings. We prove that transformers are capable of approximating strategies used to handle non-stationary environments and can learn the approximator in the in-context learning setup. Our experiments further show that transformers can match or even outperform existing expert algorithms in such environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning· slideslive