Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble
Fan-Ming Luo, Xingchen Cao, Rong-Jun Qin, Yang Yu

TL;DR
This paper introduces DARL, a dynamics-agnostic discriminator ensemble method for reward learning in imitation learning, enabling transferability across different environments by decoupling reward functions from dynamics.
Contribution
DARL is the first method to learn both state-only and state-action reward functions that are transferable across environments by decoupling rewards from dynamics using a discriminator ensemble.
Findings
DARL outperforms existing methods in transferred MuJoCo tasks.
It effectively recovers reward functions in environments with changed dynamics.
DARL handles both state-only and state-action reward scenarios.
Abstract
Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning. The recovered reward function captures the motivation of the expert. Agents can imitate experts by following these reward functions in their environment, which is known as apprentice learning. However, the agents may face environments different from the demonstrations, and therefore, desire transferable reward functions. Classical reward learning methods such as inverse reinforcement learning (IRL) or, equivalently, adversarial imitation learning (AIL), recover reward functions coupled with training dynamics, which are hard to be transferable. Previous dynamics-agnostic reward learning methods rely on assumptions such as that the reward function has to be state-only, restricting their applicability. In this work, we present a dynamics-agnostic discriminator-ensemble reward learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces
