RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan, Alexander Khazatsky, Eric Mitchell, Annie S Chen,, Sheryl Hsu, Archit Sharma, Chelsea Finn

TL;DR
This paper introduces C3PO, a method that incorporates high-level verbal feedback into large language models to prevent overgeneralization, ensuring feedback is applied only in relevant contexts while maintaining original behaviors.
Contribution
C3PO is a novel approach that uses synthetic preference datasets and constrained fine-tuning to accurately apply verbal feedback without overgeneralizing to irrelevant scenarios.
Findings
C3PO reduces overgeneralization by 30%.
It effectively applies verbal feedback in relevant contexts.
Performance is comparable to in-context baselines.
Abstract
The diversity of contexts in which large language models (LLMs) are deployed requires the ability to modify or customize default model behaviors to incorporate nuanced requirements and preferences. A convenient interface to specify such model adjustments is high-level verbal feedback, such as "Don't use emojis when drafting emails to my boss." However, while writing high-level feedback is far simpler than collecting annotations for reinforcement learning from human feedback (RLHF), we find that simply prompting a model with such feedback leads to overgeneralization of the feedback to contexts where it is not relevant. We study the problem of incorporating verbal feedback without such overgeneralization, inspiring a new method Contextualized Critiques with Constrained Preference Optimization (C3PO). C3PO uses a piece of high-level feedback to generate a small synthetic preference dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems · Neural Networks and Applications · Intelligent Tutoring Systems and Adaptive Learning
