Semi-Supervised Offline Reinforcement Learning with Action-Free   Trajectories

Qinqing Zheng; Mikael Henaff; Brandon Amos; Aditya Grover

arXiv:2210.06518·cs.LG·June 23, 2023·1 cites

Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories

Qinqing Zheng, Mikael Henaff, Brandon Amos, Aditya Grover

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a semi-supervised offline reinforcement learning framework where unlabelled trajectories are augmented with proxy labels generated via an inverse dynamics model, enabling effective learning even with limited labelled data.

Contribution

It proposes a simple meta-algorithmic pipeline combining inverse dynamics models with offline RL, demonstrating strong empirical results on benchmarks with minimal labelled data.

Findings

01

High success in matching fully labelled performance with only 10% labelled data.

02

The pipeline is flexible and effective across various offline RL algorithms.

03

Empirical study reveals key data and algorithmic factors influencing performance.

Abstract

Natural agents can effectively learn from multiple data sources that differ in size, quality, and types of measurements. We study this heterogeneity in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting. Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. Empirically, we find this simple pipeline to be highly successful -- on several D4RL benchmarks~\cite{fu2020d4rl}, certain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/ssorl
pytorchOfficial

Videos

Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Evolutionary Algorithms and Applications