RLVF: Learning from Verbal Feedback without Overgeneralization

Moritz Stephan; Alexander Khazatsky; Eric Mitchell; Annie S Chen,; Sheryl Hsu; Archit Sharma; Chelsea Finn

arXiv:2402.10893·cs.LG·February 19, 2024·1 cites

RLVF: Learning from Verbal Feedback without Overgeneralization

Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen,, Sheryl Hsu, Archit Sharma, Chelsea Finn

PDF

Open Access 1 Repo

TL;DR

This paper introduces C3PO, a method that incorporates high-level verbal feedback into large language models to prevent overgeneralization, ensuring feedback is applied only in relevant contexts while maintaining original behaviors.

Contribution

C3PO is a novel approach that uses synthetic preference datasets and constrained fine-tuning to accurately apply verbal feedback without overgeneralizing to irrelevant scenarios.

Findings

01

C3PO reduces overgeneralization by 30%.

02

It effectively applies verbal feedback in relevant contexts.

03

Performance is comparable to in-context baselines.

Abstract

The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences. A convenient interface to specify such model adjustments is high-level verbal feedback, such as "Don't use emojis when drafting emails to my boss." However, while writing high-level feedback is far simpler than collecting annotations for reinforcement learning from human feedback (RLHF), we find that simply prompting a model with such feedback leads to overgeneralization of the feedback to contexts where it is not relevant. We study the problem of incorporating verbal feedback without such overgeneralization, inspiring a new method Contextualized Critiques with Constrained Preference Optimization (C3PO). C3PO uses a piece of high-level feedback to generate a small synthetic preference dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

austrian-code-wizard/c3po
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems · Neural Networks and Applications · Intelligent Tutoring Systems and Adaptive Learning