PrefCLM: Enhancing Preference-based Reinforcement Learning with Crowdsourced Large Language Models
Ruiqi Wang, Dezhong Zhao, Ziqin Yuan, Ike Obi, and Byung-Cheol Min

TL;DR
PrefCLM leverages crowdsourced large language models as simulated teachers in preference-based reinforcement learning, improving adaptability and user satisfaction in human-robot interaction scenarios.
Contribution
This paper introduces PrefCLM, a novel framework that uses crowdsourced LLMs and Dempster-Shafer Theory to enhance preference-based RL with human-in-the-loop refinement.
Findings
Achieves competitive performance with traditional methods
Facilitates more natural and efficient robot behaviors
Significantly improves user satisfaction in HRI
Abstract
Preference-based reinforcement learning (PbRL) is emerging as a promising approach to teaching robots through human comparative feedback, sidestepping the need for complex reward engineering. However, the substantial volume of feedback required in existing PbRL methods often lead to reliance on synthetic feedback generated by scripted teachers. This approach necessitates intricate reward engineering again and struggles to adapt to the nuanced preferences particular to human-robot interaction (HRI) scenarios, where users may have unique expectations toward the same task. To address these challenges, we introduce PrefCLM, a novel framework that utilizes crowdsourced large language models (LLMs) as simulated teachers in PbRL. We utilize Dempster-Shafer Theory to fuse individual preferences from multiple LLM agents at the score level, efficiently leveraging their diversity and collective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Speech and dialogue systems · Hate Speech and Cyberbullying Detection
