Learning What To Do by Simulating the Past

David Lindner; Rohin Shah; Pieter Abbeel; Anca Dragan

arXiv:2104.03946·cs.LG·May 4, 2021

Learning What To Do by Simulating the Past

David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a method for agents to infer human preferences by simulating past actions from a single state, enabling skill learning in complex environments without extensive feedback.

Contribution

It combines learned feature encoders with inverse models to simulate human actions backwards, scaling past trajectory simulation to complex tasks.

Findings

01

Successfully reproduces skills in MuJoCo environments from a single state.

02

Enables learning from minimal data by inferring past actions.

03

Scales past trajectory simulation to complex environments.

Abstract

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HumanCompatibleAI/deep-rlsp
tfOfficial

Videos

Learning What To Do by Simulating the Past· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Topic Modeling