Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL

Shengbo Wang; Zexi Zhang

arXiv:2605.08417·cs.LG·May 12, 2026

Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL

Shengbo Wang, Zexi Zhang

PDF

TL;DR

This paper develops a new model-free algorithm for distributionally robust reinforcement learning under small ambiguity, proving its convergence and a central limit theorem, with validation through numerical experiments.

Contribution

It introduces an approximate robust Bellman equation and a two-time-scale stochastic approximation algorithm with proven convergence and a CLT for DRRL.

Findings

01

The proposed MVSA algorithm converges to the fixed point of the approximate Bellman equation.

02

A central limit theorem characterizes the asymptotic distribution of the main iterate.

03

Numerical experiments validate the theoretical convergence and CLT results.

Abstract

Designing model-free algorithms for distributionally robust reinforcement learning (DRRL) poses fundamental challenges. The robust Bellman operator is nonlinear in the transition kernel, which makes one-sample Bellman updates biased, while the adversarial optimization underlying robustness makes robust evaluation computationally demanding. To address these difficulties, we consider the natural small-ambiguity regime under Kullback--Leibler ambiguity sets and propose an approximate DRRL framework based on a first-order expansion of the relevant robust functional. This yields an approximate robust Bellman equation that removes the adversarial optimization while remaining first-order accurate in the ambiguity radius. To learn the fixed point of this approximate equation, we propose Mean-Variance Stochastic Approximation (MVSA), a model-free algorithm that uses only one-sample updates. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.