Rating-based Reinforcement Learning
Devin White, Mingkang Wu, Ellen Novoseller, Vernon J. Lawhern,, Nicholas Waytowich, Yongcan Cao

TL;DR
This paper introduces a new reinforcement learning method that leverages human ratings of individual trajectories, rather than pairwise preferences, to guide learning, supported by a novel prediction model and loss function.
Contribution
It presents a novel rating-based reinforcement learning framework that differs from preference-based methods by using direct human ratings instead of relative comparisons.
Findings
Effective in synthetic and real human rating scenarios
Outperforms preference-based methods in certain tasks
Demonstrates improved alignment with human judgment
Abstract
This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Economic and Environmental Valuation · Smart Parking Systems Research
