Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables
Mengdi Xu, Peide Huang, Yaru Niu, Visak Kumar, Jielin Qiu, Chao Fang,, Kuan-Hui Lee, Xuewei Qi, Henry Lam, Bo Li, Ding Zhao

TL;DR
This paper introduces GDR-MDP, a hierarchical framework for multi-task reinforcement learning that balances robustness and average performance by modeling task groups with latent variables, leading to improved robustness in diverse environments.
Contribution
The paper proposes GDR-MDP, a novel hierarchical MDP model that encodes task groups via latent variables and develops deep RL algorithms for it, enhancing robustness under task ambiguity.
Findings
GDR-MDP improves distributional robustness through hierarchical regularization.
Algorithms outperform classic robust training in diverse benchmarks.
Demonstrated effectiveness on Box2D, MuJoCo, and Google football platforms.
Abstract
One key challenge for multi-task Reinforcement learning (RL) in practice is the absence of task indicators. Robust RL has been applied to deal with task ambiguity, but may result in over-conservative policies. To balance the worst-case (robustness) and average performance, we propose Group Distributionally Robust Markov Decision Process (GDR-MDP), a flexible hierarchical MDP formulation that encodes task groups via a latent mixture model. GDR-MDP identifies the optimal policy that maximizes the expected return under the worst-possible qualified belief over task groups within an ambiguity set. We rigorously show that GDR-MDP's hierarchical structure improves distributional robustness by adding regularization to the worst possible outcomes. We then develop deep RL algorithms for GDR-MDP for both value-based and policy-based RL methods. Extensive experiments on Box2D control tasks, MuJoCo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Mobile Crowdsensing and Crowdsourcing
Methods((Reservation@Faqs))How do I cancel a reservation on Expedia? · Six Ways To Communicate To Someone At Expedia Via Phone And Email's. · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · 1x1 Convolution · Feedforward Network · Two Time-scale Update Rule · Projection Discriminator · Non-Local Operation · Adam
