Aligning Language Models Using Follow-up Likelihood as Reward Signal

Chen Zhang; Dading Chong; Feng Jiang; Chengguang Tang; Anningzhe Gao,; Guohua Tang; Haizhou Li

arXiv:2409.13948·cs.CL·February 25, 2025

Aligning Language Models Using Follow-up Likelihood as Reward Signal

Chen Zhang, Dading Chong, Feng Jiang, Chengguang Tang, Anningzhe Gao,, Guohua Tang, Haizhou Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel reward signal based on follow-up utterance likelihood to improve language model alignment without human annotations, achieving competitive performance on preference benchmarks.

Contribution

It proposes Follow-up Likelihood as Reward (FLR), a new method for reward modeling that leverages follow-up utterance likelihood, and demonstrates its effectiveness in aligning language models.

Findings

01

FLR matches strong reward models on preference benchmarks.

02

Mining preference data from model generations boosts helpfulness.

03

Fine-tuning models with natural language feedback enhances FLR performance.

Abstract

In natural human-to-human conversations, participants often receive feedback signals from one another based on their follow-up reactions. These reactions can include verbal responses, facial expressions, changes in emotional state, and other non-verbal cues. Similarly, in human-machine interactions, the machine can leverage the user's follow-up utterances as feedback signals to assess whether it has appropriately addressed the user's request. Therefore, we propose using the likelihood of follow-up utterances as rewards to differentiate preferred responses from less favored ones, without relying on human or commercial LLM-based preference annotations. Our proposed reward mechanism, ``Follow-up Likelihood as Reward" (FLR), matches the performance of strong reward models trained on large-scale human or GPT-4 annotated data on 8 pairwise-preference and 4 rating-based benchmarks. Building…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

e0397123/flr
pytorchOfficial

Videos

Aligning Language Models Using Follow-up Likelihood as Reward Signal· underline

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems

MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections