Reasons to Reject? Aligning Language Models with Judgments
Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi

TL;DR
This paper introduces a novel framework called Contrastive Unlikelihood Training (CUT) for aligning large language models with human-like judgments, demonstrating significant performance improvements using minimal judgment data.
Contribution
It presents the first systematic exploration of language feedback for LLM alignment and proposes CUT, a new method that effectively utilizes judgments for content correction and alignment.
Findings
CUT outperforms baseline models on AlpacaEval.
Using only 1317 judgment samples, CUT surpasses larger models.
Iterative alignment with judgments further improves performance.
Abstract
As humans, we consistently interact with our peers and receive feedback in the form of natural language. This language feedback allows us to maintain appropriate behavior, and rectify potential errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with scalar rewards, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We start with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods cannot fully capitalize on judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsALIGN
