Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories
Qinqing Zheng, Mikael Henaff, Brandon Amos, Aditya Grover

TL;DR
This paper introduces a semi-supervised offline reinforcement learning framework where unlabelled trajectories are augmented with proxy labels generated via an inverse dynamics model, enabling effective learning even with limited labelled data.
Contribution
It proposes a simple meta-algorithmic pipeline combining inverse dynamics models with offline RL, demonstrating strong empirical results on benchmarks with minimal labelled data.
Findings
High success in matching fully labelled performance with only 10% labelled data.
The pipeline is flexible and effective across various offline RL algorithms.
Empirical study reveals key data and algorithmic factors influencing performance.
Abstract
Natural agents can effectively learn from multiple data sources that differ in size, quality, and types of measurements. We study this heterogeneity in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting. Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories. Empirically, we find this simple pipeline to be highly successful -- on several D4RL benchmarks~\cite{fu2020d4rl}, certain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Evolutionary Algorithms and Applications
