Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

Fan-Ming Luo; Xingchen Cao; Rong-Jun Qin; Yang Yu

arXiv:2206.00238·cs.LG·June 27, 2024

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

Fan-Ming Luo, Xingchen Cao, Rong-Jun Qin, Yang Yu

PDF

Open Access

TL;DR

This paper introduces DARL, a dynamics-agnostic discriminator ensemble method for reward learning in imitation learning, enabling transferability across different environments by decoupling reward functions from dynamics.

Contribution

DARL is the first method to learn both state-only and state-action reward functions that are transferable across environments by decoupling rewards from dynamics using a discriminator ensemble.

Findings

01

DARL outperforms existing methods in transferred MuJoCo tasks.

02

It effectively recovers reward functions in environments with changed dynamics.

03

DARL handles both state-only and state-action reward scenarios.

Abstract

Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning. The recovered reward function captures the motivation of the expert. Agents can imitate experts by following these reward functions in their environment, which is known as apprentice learning. However, the agents may face environments different from the demonstrations, and therefore, desire transferable reward functions. Classical reward learning methods such as inverse reinforcement learning (IRL) or, equivalently, adversarial imitation learning (AIL), recover reward functions coupled with training dynamics, which are hard to be transferable. Previous dynamics-agnostic reward learning methods rely on assumptions such as that the reward function has to be state-only, restricting their applicability. In this work, we present a dynamics-agnostic discriminator-ensemble reward learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces