Preference-grounded Token-level Guidance for Language Model Fine-tuning

Shentao Yang; Shujian Zhang; Congying Xia; Yihao Feng; Caiming Xiong,; Mingyuan Zhou

arXiv:2306.00398·cs.CL·January 9, 2025·1 cites

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Shentao Yang, Shujian Zhang, Congying Xia, Yihao Feng, Caiming Xiong,, Mingyuan Zhou

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a novel training method that aligns sequence-level preferences with token-level language model training, improving fine-tuning by iteratively grounding preferences into token guidance.

Contribution

It proposes an iterative training framework that extends preference learning to token-level guidance for better LM fine-tuning, addressing granularity mismatch issues.

Findings

01

Competitive performance on prompt generation tasks

02

Effective preference grounding at token level

03

Versatile approach for different LM tasks

Abstract

Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the token level. There is, therefore, a granularity mismatch between the preference and the LM training losses, which may complicate the learning problem. In this paper, we address this issue by developing an alternate training process, where we iterate between grounding the sequence-level preference into token-level training guidance, and improving the LM with the learned guidance. For guidance learning, we design a framework that extends the pairwise-preference learning in imitation learning to both variable-length LM generation and the utilization of the preference among multiple generations. For LM training, based on the amount of supervised data, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Preference-grounded Token-level Guidance for Language Model Fine-tuning· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications