SR-Reward: Taking The Path More Traveled
Seyed Mahdi B. Azad, Zahra Padar, Gabriel Kalweit, Joschka Boedecker

TL;DR
This paper introduces SR-Reward, a novel method for learning reward functions from offline demonstrations using successor representations, which improves stability and robustness in offline reinforcement learning.
Contribution
The paper presents SR-Reward, a successor representation-based reward learning method that decouples reward from policy, enabling stable offline RL without adversarial training.
Findings
Achieves competitive results on D4RL benchmark.
Enhances robustness with negative sampling strategy.
Reveals advantages and limitations through ablation studies.
Abstract
In this paper, we propose a novel method for learning reward functions directly from offline demonstrations. Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner's policy, eliminating the adversarial interaction typically required between the two. This results in a more stable and efficient training process. Our reward function, called \textit{SR-Reward}, leverages successor representation (SR) to encode a state based on expected future states' visitation under the demonstration policy and transition dynamics. By utilizing the Bellman equation, SR-Reward can be learned concurrently with most reinforcement learning (RL) algorithms without altering the existing training pipeline. We also introduce a negative sampling strategy to mitigate overestimation errors by reducing rewards for out-of-distribution data, thereby enhancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
