ICPL: Few-shot In-context Preference Learning via LLMs

Chao Yu; Qixin Tan; Hong Lu; Jiaxuan Gao; Xinting Yang; Yu Wang; Yi; Wu; Eugene Vinitsky

arXiv:2410.17233·cs.AI·April 4, 2025

ICPL: Few-shot In-context Preference Learning via LLMs

Chao Yu, Qixin Tan, Hong Lu, Jiaxuan Gao, Xinting Yang, Yu Wang, Yi, Wu, Eugene Vinitsky

PDF

Open Access

TL;DR

This paper introduces ICPL, a novel method leveraging Large Language Models' in-context learning to efficiently learn preferences in reinforcement learning, significantly outperforming baseline methods in both synthetic and human-in-the-loop settings.

Contribution

ICPL is the first approach to utilize LLMs' native preference-learning capabilities for sample-efficient reinforcement learning with human feedback.

Findings

01

ICPL outperforms baseline preference methods in synthetic tests.

02

ICPL achieves higher performance and efficiency in preference learning.

03

ICPL effectively incorporates human feedback in real-world trials.

Abstract

Preference-based reinforcement learning is an effective way to handle tasks where rewards are hard to specify but can be exceedingly inefficient as preference learning is often tabula rasa. We demonstrate that Large Language Models (LLMs) have native preference-learning capabilities that allow them to achieve sample-efficient preference learning, addressing this challenge. We propose In-Context Preference Learning (ICPL), which uses in-context learning capabilities of LLMs to reduce human query inefficiency. ICPL uses the task description and basic environment code to create sets of reward functions which are iteratively refined by placing human feedback over videos of the resultant policies into the context of an LLM and then requesting better rewards. We first demonstrate ICPL's effectiveness through a synthetic preference study, providing quantitative evidence that it significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Web Data Mining and Analysis · Text and Document Classification Technologies

MethodsSparse Evolutionary Training