Offline Reinforcement Learning with Imputed Rewards
Carlo Romeo, Andrew D. Bagdanov

TL;DR
This paper introduces a simple reward modeling approach that enables offline reinforcement learning with minimal reward-labeled data by imputing rewards for reward-free transitions, facilitating effective agent training in data-scarce scenarios.
Contribution
The paper proposes a reward model that estimates rewards from limited data and imputes them for reward-free transitions, expanding offline RL applicability in data-scarce environments.
Findings
Reward model accurately imputes rewards with only 1% labeled data.
Imputed rewards enable training of performant agents in continuous locomotion tasks.
Approach reduces the need for extensive reward annotations in offline RL.
Abstract
Offline Reinforcement Learning (ORL) offers a robust solution to training agents in applications where interactions with the environment must be strictly limited due to cost, safety, or lack of accurate simulation environments. Despite its potential to facilitate deployment of artificial agents in the real world, Offline Reinforcement Learning typically requires very many demonstrations annotated with ground-truth rewards. Consequently, state-of-the-art ORL algorithms can be difficult or impossible to apply in data-scarce scenarios. In this paper we propose a simple but effective Reward Model that can estimate the reward signal from a very limited sample of environment transitions annotated with rewards. Once the reward signal is modeled, we use the Reward Model to impute rewards for a large sample of reward-free transitions, thus enabling the application of ORL techniques. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
