Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward   Augmented Imitation

Yihong Guo; Yixuan Wang; Yuanyuan Shi; Pan Xu; Anqi Liu

arXiv:2411.09891·cs.LG·November 18, 2024

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Yihong Guo, Yixuan Wang, Yuanyuan Shi, Pan Xu, Anqi Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DARAIL, a novel approach combining reward augmentation and imitation learning to improve policy transfer across domains with different dynamics, addressing performance degradation issues.

Contribution

The paper proposes DARAIL, integrating reward modification with imitation learning for better policy transfer in off-dynamics reinforcement learning scenarios.

Findings

01

DARAIL outperforms pure reward modification methods.

02

DARAIL surpasses baseline methods in benchmark environments.

03

Theoretical error bounds support the method's effectiveness.

Abstract

Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards derived by matching distributions between the source and the target optimal trajectories. However, pure modified rewards only ensure the behavior of the learned policy in the source domain resembles trajectories produced by the target optimal policies, which does not guarantee optimal performance when the learned policy is actually deployed to the target domain. In this work, we propose to utilize imitation learning to transfer the policy learned from the reward modification to the target domain so that the new policy can generate the same trajectories in the target domain. Our approach, Domain Adaptation and Reward Augmented Imitation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guoyihonggyh/Off-Dynamics-Reinforcement-Learning-via-Domain-Adaptation-and-Reward-Augmented-Imitation
pytorchOfficial

Videos

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics