Central Limit Theorem for Two-Time-Scale Approximate Distributionally Robust RL
Shengbo Wang, Zexi Zhang

TL;DR
This paper develops a new model-free algorithm for distributionally robust reinforcement learning under small ambiguity, proving its convergence and a central limit theorem, with validation through numerical experiments.
Contribution
It introduces an approximate robust Bellman equation and a two-time-scale stochastic approximation algorithm with proven convergence and a CLT for DRRL.
Findings
The proposed MVSA algorithm converges to the fixed point of the approximate Bellman equation.
A central limit theorem characterizes the asymptotic distribution of the main iterate.
Numerical experiments validate the theoretical convergence and CLT results.
Abstract
Designing model-free algorithms for distributionally robust reinforcement learning (DRRL) poses fundamental challenges. The robust Bellman operator is nonlinear in the transition kernel, which makes one-sample Bellman updates biased, while the adversarial optimization underlying robustness makes robust evaluation computationally demanding. To address these difficulties, we consider the natural small-ambiguity regime under Kullback--Leibler ambiguity sets and propose an approximate DRRL framework based on a first-order expansion of the relevant robust functional. This yields an approximate robust Bellman equation that removes the adversarial optimization while remaining first-order accurate in the ambiguity radius. To learn the fixed point of this approximate equation, we propose Mean-Variance Stochastic Approximation (MVSA), a model-free algorithm that uses only one-sample updates. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
