TROFI: Trajectory-Ranked Offline Inverse Reinforcement Learning
Alessandro Sestini, Joakim Bergdahl, Konrad Tollmar, Andrew D. Bagdanov, Linus Gissl\'en

TL;DR
TROFI introduces a method to learn policies offline without pre-defined rewards by deriving a reward function from human preferences, outperforming baselines on benchmarks and validating in a 3D game environment.
Contribution
The paper presents TROFI, a novel offline inverse reinforcement learning approach that learns reward functions from preferences, eliminating the need for optimal trajectories.
Findings
TROFI outperforms baseline methods on D4RL benchmarks.
It performs comparably to using ground truth rewards.
The method is validated in a 3D game environment.
Abstract
In offline reinforcement learning, agents are trained using only a fixed set of stored transitions derived from a source policy. However, this requires that the dataset be labeled by a reward function. In applied settings such as video game development, the availability of the reward function is not always guaranteed. This paper proposes Trajectory-Ranked OFfline Inverse reinforcement learning (TROFI), a novel approach to effectively learn a policy offline without a pre-defined reward function. TROFI first learns a reward function from human preferences, which it then uses to label the original dataset making it usable for training the policy. In contrast to other approaches, our method does not require optimal trajectories. Through experiments on the D4RL benchmark we demonstrate that TROFI consistently outperforms baselines and performs comparably to using the ground truth reward to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Domain Adaptation and Few-Shot Learning
