PrefCLM: Enhancing Preference-based Reinforcement Learning with   Crowdsourced Large Language Models

Ruiqi Wang; Dezhong Zhao; Ziqin Yuan; Ike Obi; and Byung-Cheol Min

arXiv:2407.08213·cs.RO·January 9, 2025

PrefCLM: Enhancing Preference-based Reinforcement Learning with Crowdsourced Large Language Models

Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Ike Obi, and Byung-Cheol Min

PDF

Open Access

TL;DR

PrefCLM leverages crowdsourced large language models as simulated teachers in preference-based reinforcement learning, improving adaptability and user satisfaction in human-robot interaction scenarios.

Contribution

This paper introduces PrefCLM, a novel framework that uses crowdsourced LLMs and Dempster-Shafer Theory to enhance preference-based RL with human-in-the-loop refinement.

Findings

01

Achieves competitive performance with traditional methods

02

Facilitates more natural and efficient robot behaviors

03

Significantly improves user satisfaction in HRI

Abstract

Preference-based reinforcement learning (PbRL) is emerging as a promising approach to teaching robots through human comparative feedback, sidestepping the need for complex reward engineering. However, the substantial volume of feedback required in existing PbRL methods often lead to reliance on synthetic feedback generated by scripted teachers. This approach necessitates intricate reward engineering again and struggles to adapt to the nuanced preferences particular to human-robot interaction (HRI) scenarios, where users may have unique expectations toward the same task. To address these challenges, we introduce PrefCLM, a novel framework that utilizes crowdsourced large language models (LLMs) as simulated teachers in PbRL. We utilize Dempster-Shafer Theory to fuse individual preferences from multiple LLM agents at the score level, efficiently leveraging their diversity and collective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Speech and dialogue systems · Hate Speech and Cyberbullying Detection