Sequence-level Large Language Model Training with Contrastive Preference   Optimization

Zhili Feng; Dhananjay Ram; Cole Hawkins; Aditya Rawal; Jinman Zhao,; Sheng Zha

arXiv:2502.16433·cs.CL·February 25, 2025

Sequence-level Large Language Model Training with Contrastive Preference Optimization

Zhili Feng, Dhananjay Ram, Cole Hawkins, Aditya Rawal, Jinman Zhao,, Sheng Zha

PDF

Open Access 1 Video

TL;DR

This paper introduces a contrastive preference optimization method that enhances large language models by incorporating sequence-level signals, improving instruction-following and text generation performance without needing labeled data.

Contribution

It proposes a novel contrastive preference optimization technique to inject sequence-level understanding into language models during training.

Findings

01

CPO surpasses next token prediction in win rate for instruction-following tasks.

02

The method improves text generation quality.

03

Sequence-level signals enhance model performance without labeled data.

Abstract

The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive preference optimization (CPO) procedure that can inject sequence-level information into the language model at any training stage without expensive human labeled data. Our experiments show that the proposed objective surpasses the next token prediction in terms of win rate in the instruction-following and text generation tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sequence-level Large Language Model Training with Contrastive Preference Optimization· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies