Learning Value Functions from Undirected State-only Experience

Matthew Chang; Arjun Gupta; Saurabh Gupta

arXiv:2204.12458·cs.LG·April 27, 2022·1 cites

Learning Value Functions from Undirected State-only Experience

Matthew Chang, Arjun Gupta, Saurabh Gupta

PDF

Open Access 1 Video

TL;DR

This paper introduces LAQ, a novel offline reinforcement learning method that learns value functions from state-only experience using latent actions, enabling effective goal-directed behavior and transfer across different embodiments.

Contribution

The paper provides a theoretical characterization of Q-learning in state-only settings and proposes LAQ, a new method that learns value functions from latent actions derived from future prediction models.

Findings

01

LAQ achieves high correlation with ground-truth action-based value functions.

02

LAQ demonstrates sample-efficient goal-directed behavior in diverse environments.

03

LAQ outperforms imitation learning and other baselines in experiments.

Abstract

This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicability of Q-learning in this setting. We show that tabular Q-learning in discrete Markov decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. Latent Action Q-learning (LAQ) learns value functions using Q-learning on discrete latent actions obtained through a latent-variable future prediction model. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions. Value functions learned using LAQ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning Value Functions from Undirected State-only Experience· slideslive

Taxonomy

TopicsNeural dynamics and brain function · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)

MethodsQ-Learning