LobsDICE: Offline Learning from Observation via Stationary Distribution   Correction Estimation

Geon-Hyeong Kim; Jongmin Lee; Youngsoo Jang; Hongseok Yang; Kee-Eung; Kim

arXiv:2202.13536·cs.LG·October 19, 2022·1 cites

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Geon-Hyeong Kim, Jongmin Lee, Youngsoo Jang, Hongseok Yang, Kee-Eung, Kim

PDF

Open Access 2 Repos 1 Video

TL;DR

LobsDICE is an offline learning algorithm that enables agents to imitate expert behavior using only state observations and transition data, without environment interaction, by optimizing stationary distribution divergence.

Contribution

The paper introduces LobsDICE, a novel convex optimization-based offline LfO method that effectively learns from state-only demonstrations and transition data.

Findings

01

LobsDICE outperforms baseline methods in offline LfO tasks.

02

The algorithm effectively minimizes divergence between expert and agent stationary distributions.

03

LobsDICE requires only a single convex minimization problem.

Abstract

We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research