DERAIL: Diagnostic Environments for Reward And Imitation Learning

Pedro Freire; Adam Gleave; Sam Toyer; Stuart Russell

arXiv:2012.01365·cs.LG·December 3, 2020·1 cites

DERAIL: Diagnostic Environments for Reward And Imitation Learning

Pedro Freire, Adam Gleave, Sam Toyer, Stuart Russell

PDF

Open Access 2 Repos

TL;DR

DERAIL introduces simple diagnostic environments to evaluate reward and imitation learning algorithms, enabling precise failure analysis and rapid testing of algorithmic improvements beyond complex, slow benchmarks.

Contribution

The paper presents a suite of simple, targeted diagnostic tasks for isolating and analyzing specific facets of reward and imitation learning algorithms.

Findings

01

Algorithm performance varies significantly with implementation details.

02

The diagnostic suite can identify design flaws in reward learning methods.

03

Rapid evaluation of candidate solutions is facilitated by the suite.

Abstract

The objective of many real-world tasks is complex and difficult to procedurally specify. This makes it necessary to use reward or imitation learning algorithms to infer a reward or policy directly from human data. Existing benchmarks for these algorithms focus on realism, testing in complex environments. Unfortunately, these benchmarks are slow, unreliable and cannot isolate failures. As a complementary approach, we develop a suite of simple diagnostic tasks that test individual facets of algorithm performance in isolation. We evaluate a range of common reward and imitation learning algorithms on our tasks. Our results confirm that algorithm performance is highly sensitive to implementation details. Moreover, in a case-study into a popular preference-based reward learning implementation, we illustrate how the suite can pinpoint design flaws and rapidly evaluate candidate solutions. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Multimodal Machine Learning Applications