Intermediate direct preference optimization
Atsushi Kojima

TL;DR
This paper introduces intermediate direct preference optimization (DPO), a novel method that calculates DPO loss at intermediate layers of large language models to improve fine-tuning performance, showing promising results against traditional methods.
Contribution
The paper proposes a new intermediate DPO approach that computes DPO loss at selected layers, enhancing fine-tuning of large language models compared to conventional final-layer DPO.
Findings
Intermediate DPO achieves higher win rates against conventional DPO.
Performance depends on the choice of intermediate layer position.
Intermediate DPO effectively leverages multiple layer logits during training.
Abstract
We propose the intermediate direct preference optimization (DPO) method to calculate the DPO loss at selected intermediate layers as an auxiliary loss for finetuning large language models (LLMs). The conventional DPO method fine-tunes a supervised fine-tuning (SFT) model by calculating the DPO loss using logits from the final layer. In our intermediate DPO approach, DPO losses are calculated using the logits from K-selected intermediate layers and averaged to obtain the intermediate DPO loss. For training the intermediate DPO model, the final loss is obtained by calculating the weighted sum of the DPO and intermediate DPO losses. During inference, the intermediate DPO model decodes using the final layer logits similarly to the conventional DPO model. In experiments using the ultrafeedback dataset, the performance of the intermediate DPO model was evaluated using GPT-4. As a result, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making
MethodsAttention Is All You Need · Direct Preference Optimization · Linear Layer · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings
