Sequence-level Large Language Model Training with Contrastive Preference Optimization
Zhili Feng, Dhananjay Ram, Cole Hawkins, Aditya Rawal, Jinman Zhao,, Sheng Zha

TL;DR
This paper introduces a contrastive preference optimization method that enhances large language models by incorporating sequence-level signals, improving instruction-following and text generation performance without needing labeled data.
Contribution
It proposes a novel contrastive preference optimization technique to inject sequence-level understanding into language models during training.
Findings
CPO surpasses next token prediction in win rate for instruction-following tasks.
The method improves text generation quality.
Sequence-level signals enhance model performance without labeled data.
Abstract
The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results in a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between training and inference processes. To bridge this gap, we introduce a contrastive preference optimization (CPO) procedure that can inject sequence-level information into the language model at any training stage without expensive human labeled data. Our experiments show that the proposed objective surpasses the next token prediction in terms of win rate in the instruction-following and text generation tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
