ICPL: Few-shot In-context Preference Learning via LLMs
Chao Yu, Qixin Tan, Hong Lu, Jiaxuan Gao, Xinting Yang, Yu Wang, Yi, Wu, Eugene Vinitsky

TL;DR
This paper introduces ICPL, a novel method leveraging Large Language Models' in-context learning to efficiently learn preferences in reinforcement learning, significantly outperforming baseline methods in both synthetic and human-in-the-loop settings.
Contribution
ICPL is the first approach to utilize LLMs' native preference-learning capabilities for sample-efficient reinforcement learning with human feedback.
Findings
ICPL outperforms baseline preference methods in synthetic tests.
ICPL achieves higher performance and efficiency in preference learning.
ICPL effectively incorporates human feedback in real-world trials.
Abstract
Preference-based reinforcement learning is an effective way to handle tasks where rewards are hard to specify but can be exceedingly inefficient as preference learning is often tabula rasa. We demonstrate that Large Language Models (LLMs) have native preference-learning capabilities that allow them to achieve sample-efficient preference learning, addressing this challenge. We propose In-Context Preference Learning (ICPL), which uses in-context learning capabilities of LLMs to reduce human query inefficiency. ICPL uses the task description and basic environment code to create sets of reward functions which are iteratively refined by placing human feedback over videos of the resultant policies into the context of an LLM and then requesting better rewards. We first demonstrate ICPL's effectiveness through a synthetic preference study, providing quantitative evidence that it significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Web Data Mining and Analysis · Text and Document Classification Technologies
MethodsSparse Evolutionary Training
