GRAM: Generalization in Deep RL with a Robust Adaptation Module
James Queeney, Xiaoyi Cai, Alexander Schperberg, Radu Corcodel, Mouhacine Benosman, Jonathan P. How

TL;DR
This paper introduces GRAM, a framework for deep reinforcement learning that enhances generalization across both familiar and novel environment dynamics using a robust adaptation module, validated through simulation and hardware tests.
Contribution
The paper proposes a unified architecture with a robust adaptation module for dynamics generalization in deep RL, combining in-distribution adaptation and out-of-distribution robustness.
Findings
Strong generalization in simulation and hardware tests
Effective handling of both in-distribution and out-of-distribution scenarios
Improved robustness in quadruped robot locomotion
Abstract
The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment, which we demonstrate through…
Peer Reviews
Decision·Submitted to ICLR 2025
- The manuscript is extremely well written and provides clear statements to understand the proposed method. - The authors demonstrate a successful implementation of recently introduced Epistemic Neural Networks for a relevant and important open problem in robot learning (i.e. generalization over unobservable environment contexts)
- Missing recent related works: the authors should consider citing and discussing recent works [1] and [2] as they present themselves as state-of-the-art methods in Robust RL and Domain Randomization as of 2024. Specifically, DORAEMON [2] tackles the same problem setting as in this work where privileged information is available at training time, and a history of previous observations and actions is used to allow implicit system identification at test time and promote adaptive behavior. - Limite
- The paper is interesting and the key ideas of adaptive RL and robust RL are easy to follow - To the best of my knowledge the robust adaptation module relying on context identification from history and epistemic uncertainty of the policy is novel in the context of generalization in RL. - The use of a teacher-student training paradigm is nice, and since the student doesn't require privileged information, the approach can be potentially scaled to real world tasks via sim2real. - The experim
- It is unclear how general is the proposed approach in dealing with different types of variations. The paper is motivated from the perspective of very generic generalization in RL but the experiments are limited to a single task/environment and correspond to only dynamics variations. What about variations in visual scenes? generalization to different tasks with the same robot? - The in-distribution and out-of distribution generalization motivation is a bit confusing in the introduction. The au
1. The method is simple, novel and easy to understand. It does not involve too much code change with vinilla in-context adaptation. 2. The empirical results is comprehensive in quadraped OOD generalization case, which is persuasive.
1. The paper does not consider other easy to implement baselines, e.g. domain invariance prediction, adding noise to the context, or use a larger replay buffer for the context encoder training. I think to show that this specific design is useful, one should also consider other easy-to-implement baselines. 2. GRAM biases towards 0, which is the mean at the beginning of the training. It is unclear if this is the best choice. How about bias toward the mean at the end of the training of the context
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Reinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
