LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation
Geon-Hyeong Kim, Jongmin Lee, Youngsoo Jang, Hongseok Yang, Kee-Eung, Kim

TL;DR
LobsDICE is an offline learning algorithm that enables agents to imitate expert behavior using only state observations and transition data, without environment interaction, by optimizing stationary distribution divergence.
Contribution
The paper introduces LobsDICE, a novel convex optimization-based offline LfO method that effectively learns from state-only demonstrations and transition data.
Findings
LobsDICE outperforms baseline methods in offline LfO tasks.
The algorithm effectively minimizes divergence between expert and agent stationary distributions.
LobsDICE requires only a single convex minimization problem.
Abstract
We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research
