Distributionally Robust Self Paced Curriculum Reinforcement Learning
Anirudh Satheesh, Keenan Powell, Vaneet Aggarwal

TL;DR
This paper introduces a novel reinforcement learning method that adaptively adjusts robustness to improve stability and performance under distribution shifts, outperforming fixed robustness strategies.
Contribution
It proposes DR-SPCRL, a curriculum approach that dynamically schedules the robustness budget, balancing performance and robustness during training.
Findings
Achieves 11.8% higher episodic return under perturbations.
Stabilizes training compared to fixed robustness methods.
Nearly doubles the performance of nominal RL algorithms.
Abstract
A central challenge in reinforcement learning is that policies trained in controlled environments often fail under distribution shifts at deployment into real-world environments. Distributionally Robust Reinforcement Learning (DRRL) addresses this by optimizing for worst-case performance within an uncertainty set defined by a robustness budget . However, fixing results in a tradeoff between performance and robustness: small values yield high nominal performance but weak robustness, while large values can result in instability and overly conservative policies. We propose Distributionally Robust Self-Paced Curriculum Reinforcement Learning (DR-SPCRL), a method that overcomes this limitation by treating as a continuous curriculum. DR-SPCRL adaptively schedules the robustness budget according to the agent's progress, enabling a balance between nominal and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques
