Rating-based Reinforcement Learning

Devin White; Mingkang Wu; Ellen Novoseller; Vernon J. Lawhern,; Nicholas Waytowich; Yongcan Cao

arXiv:2307.16348·cs.LG·January 30, 2024

Rating-based Reinforcement Learning

Devin White, Mingkang Wu, Ellen Novoseller, Vernon J. Lawhern,, Nicholas Waytowich, Yongcan Cao

PDF

Open Access

TL;DR

This paper introduces a new reinforcement learning method that leverages human ratings of individual trajectories, rather than pairwise preferences, to guide learning, supported by a novel prediction model and loss function.

Contribution

It presents a novel rating-based reinforcement learning framework that differs from preference-based methods by using direct human ratings instead of relative comparisons.

Findings

01

Effective in synthetic and real human rating scenarios

02

Outperforms preference-based methods in certain tasks

03

Demonstrates improved alignment with human judgment

Abstract

This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Economic and Environmental Valuation · Smart Parking Systems Research