Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation
Yihong Guo, Yixuan Wang, Yuanyuan Shi, Pan Xu, Anqi Liu

TL;DR
This paper introduces DARAIL, a novel approach combining reward augmentation and imitation learning to improve policy transfer across domains with different dynamics, addressing performance degradation issues.
Contribution
The paper proposes DARAIL, integrating reward modification with imitation learning for better policy transfer in off-dynamics reinforcement learning scenarios.
Findings
DARAIL outperforms pure reward modification methods.
DARAIL surpasses baseline methods in benchmark environments.
Theoretical error bounds support the method's effectiveness.
Abstract
Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards only ensure the behavior of the learned policy in the source domain resembles trajectories produced by the target optimal policies, which does not guarantee optimal performance when the learned policy is actually deployed to the target domain. In this work, we propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain so that the new policy can generate the same trajectories in the target domain. Our approach, Domain Adaptation and Reward Augmented Imitation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
