Distributionally Robust Self Paced Curriculum Reinforcement Learning

Anirudh Satheesh; Keenan Powell; Vaneet Aggarwal

arXiv:2511.05694·cs.LG·March 10, 2026

Distributionally Robust Self Paced Curriculum Reinforcement Learning

Anirudh Satheesh, Keenan Powell, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning method that adaptively adjusts robustness to improve stability and performance under distribution shifts, outperforming fixed robustness strategies.

Contribution

It proposes DR-SPCRL, a curriculum approach that dynamically schedules the robustness budget, balancing performance and robustness during training.

Findings

01

Achieves 11.8% higher episodic return under perturbations.

02

Stabilizes training compared to fixed robustness methods.

03

Nearly doubles the performance of nominal RL algorithms.

Abstract

A central challenge in reinforcement learning is that policies trained in controlled environments often fail under distribution shifts at deployment into real-world environments. Distributionally Robust Reinforcement Learning (DRRL) addresses this by optimizing for worst-case performance within an uncertainty set defined by a robustness budget $ϵ$ . However, fixing $ϵ$ results in a tradeoff between performance and robustness: small values yield high nominal performance but weak robustness, while large values can result in instability and overly conservative policies. We propose Distributionally Robust Self-Paced Curriculum Reinforcement Learning (DR-SPCRL), a method that overcomes this limitation by treating $ϵ$ as a continuous curriculum. DR-SPCRL adaptively schedules the robustness budget according to the agent's progress, enabling a balance between nominal and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques