Reasons to Reject? Aligning Language Models with Judgments

Weiwen Xu; Deng Cai; Zhisong Zhang; Wai Lam; Shuming Shi

arXiv:2312.14591·cs.CL·June 7, 2024·1 cites

Reasons to Reject? Aligning Language Models with Judgments

Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

This paper introduces a novel framework called Contrastive Unlikelihood Training (CUT) for aligning large language models with human-like judgments, demonstrating significant performance improvements using minimal judgment data.

Contribution

It presents the first systematic exploration of language feedback for LLM alignment and proposes CUT, a new method that effectively utilizes judgments for content correction and alignment.

Findings

01

CUT outperforms baseline models on AlpacaEval.

02

Using only 1317 judgment samples, CUT surpasses larger models.

03

Iterative alignment with judgments further improves performance.

Abstract

As humans, we consistently interact with our peers and receive feedback in the form of natural language. This language feedback allows us to maintain appropriate behavior, and rectify potential errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with scalar rewards, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We start with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods cannot fully capitalize on judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wwxu21/cut
pytorchOfficial

Models

Videos

Reasons to Reject? Aligning Language Models with Judgments· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsALIGN