Off-Dynamics Inverse Reinforcement Learning from Hetero-Domain
Yachen Kang, Jinxin Liu, Xin Cao, Donglin Wang

TL;DR
This paper introduces a novel inverse reinforcement learning approach that learns reward functions transferable across different dynamics domains by incorporating dynamics discrepancy into the discriminator, demonstrated on continuous control tasks.
Contribution
It proposes a GAN-inspired IRL method that explicitly accounts for dynamics differences, enabling learning of transferable reward functions from real-world demonstrations to simulators.
Findings
Effective transfer of reward functions across domains
Scalability to high-dimensional control tasks
Improved imitation performance by considering dynamics discrepancies
Abstract
We propose an approach for inverse reinforcement learning from hetero-domain which learns a reward function in the simulator, drawing on the demonstrations from the real world. The intuition behind the method is that the reward function should not only be oriented to imitate the experts, but should encourage actions adjusted for the dynamics difference between the simulator and the real world. To achieve this, the widely used GAN-inspired IRL method is adopted, and its discriminator, recognizing policy-generating trajectories, is modified with the quantification of dynamics difference. The training process of the discriminator can yield the transferable reward function suitable for simulator dynamics, which can be guaranteed by derivation. Effectively, our method assigns higher rewards for demonstration trajectories which do not exploit discrepancies between the two domains. With…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Model Reduction and Neural Networks
