Aligning Large Language Models via Fine-grained Supervision

Dehong Xu; Liang Qiu; Minseok Kim; Faisal Ladhak; Jaeyoung Do

arXiv:2406.02756·cs.CL·June 6, 2024

Aligning Large Language Models via Fine-grained Supervision

Dehong Xu, Liang Qiu, Minseok Kim, Faisal Ladhak, Jaeyoung Do

PDF

Open Access 1 Video

TL;DR

This paper introduces a fine-grained, token-level supervision method for aligning large language models, improving their alignment accuracy and performance over traditional sequence-level feedback approaches.

Contribution

It proposes a novel token-level reward model trained on minimally edited responses, enhancing LLM alignment beyond coarse preference signals.

Findings

01

Achieves up to 5.1% improvement in win rate against reference models

02

Demonstrates the effectiveness of fine-grained supervision in LLM alignment

03

Outperforms traditional PPO-based alignment methods

Abstract

Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human feedback (RLHF) to improve model alignment, which works by transforming coarse human preferences of LLM outputs into a feedback signal that guides the model learning process. However, because this approach operates on sequence-level feedback, it lacks the precision to identify the exact parts of the output affecting user preferences. To address this gap, we propose a method to enhance LLM alignment through fine-grained token-level supervision. Specifically, we ask annotators to minimally edit less preferred responses within the standard reward modeling dataset to make them more favorable, ensuring changes are made only where necessary while retaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Aligning Large Language Models via Fine-grained Supervision· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsFocus · ALIGN · Entropy Regularization · Proximal Policy Optimization