Inverse Preference Learning: Preference-based RL without a Reward Function
Joey Hejna, Dorsa Sadigh

TL;DR
This paper introduces Inverse Preference Learning (IPL), a parameter-efficient offline preference-based RL method that learns directly from human preferences without explicitly modeling reward functions, simplifying the process.
Contribution
IPL leverages the insight that the Q-function encodes all reward information, eliminating the need for a separate reward model, and achieves competitive results with fewer parameters.
Findings
IPL performs well on continuous control and robotics benchmarks.
IPL is simpler and more parameter-efficient than transformer-based methods.
IPL matches or exceeds performance of complex reward modeling approaches.
Abstract
Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of preference-based RL methods na\"ively combine supervised reward models with off-the-shelf RL algorithms. Contemporary approaches have sought to improve performance and query complexity by using larger and more complex reward architectures such as transformers. Instead of using highly complex architectures, we develop a new and parameter-efficient algorithm, Inverse Preference Learning (IPL), specifically designed for learning from offline preference data. Our key insight is that for a fixed policy, the -function encodes all information about the reward function, effectively making them interchangeable. Using this insight, we completely eliminate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Peroxisome Proliferator-Activated Receptors · Reinforcement Learning in Robotics
