Offline Reinforcement Learning with Imputed Rewards

Carlo Romeo; Andrew D. Bagdanov

arXiv:2407.10839·cs.LG·July 16, 2024

Offline Reinforcement Learning with Imputed Rewards

Carlo Romeo, Andrew D. Bagdanov

PDF

Open Access

TL;DR

This paper introduces a simple reward modeling approach that enables offline reinforcement learning with minimal reward-labeled data by imputing rewards for reward-free transitions, facilitating effective agent training in data-scarce scenarios.

Contribution

The paper proposes a reward model that estimates rewards from limited data and imputes them for reward-free transitions, expanding offline RL applicability in data-scarce environments.

Findings

01

Reward model accurately imputes rewards with only 1% labeled data.

02

Imputed rewards enable training of performant agents in continuous locomotion tasks.

03

Approach reduces the need for extensive reward annotations in offline RL.

Abstract

Offline Reinforcement Learning (ORL) offers a robust solution to training agents in applications where interactions with the environment must be strictly limited due to cost, safety, or lack of accurate simulation environments. Despite its potential to facilitate deployment of artificial agents in the real world, Offline Reinforcement Learning typically requires very many demonstrations annotated with ground-truth rewards. Consequently, state-of-the-art ORL algorithms can be difficult or impossible to apply in data-scarce scenarios. In this paper we propose a simple but effective Reward Model that can estimate the reward signal from a very limited sample of environment transitions annotated with rewards. Once the reward signal is modeled, we use the Reward Model to impute rewards for a large sample of reward-free transitions, thus enabling the application of ORL techniques. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics