Model Extrapolation Expedites Alignment

Chujie Zheng; Ziqi Wang; Heng Ji; Minlie Huang; Nanyun Peng

arXiv:2404.16792·cs.LG·June 2, 2025·1 cites

Model Extrapolation Expedites Alignment

Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

PDF

Open Access 1 Repo 10 Models 1 Video 4 Reviews

TL;DR

This paper introduces ExPO, a simple method that accelerates large language model alignment by amplifying parameter changes, significantly reducing training time while maintaining or improving performance.

Contribution

ExPO leverages a first-order approximation to enhance alignment training efficiency without additional training, outperforming fully trained models with fewer steps.

Findings

01

ExPO boosts a 20%-trained model to outperform fully trained counterparts.

02

ExPO improves alignment of open-source LLMs across multiple benchmarks.

03

The method reduces training overhead while maintaining high performance.

Abstract

Given the high computational cost of preference alignment training of large language models (LLMs), exploring efficient methods to reduce the training overhead remains an important and compelling research problem. Motivated by the observation that alignment training typically involves only small parameter changes without injecting new knowledge into models, we propose a straightforward method called ExPO (model extrapolation) to expedite LLMs' alignment with human preferences. Given a partially-trained model and its initial SFT checkpoint, ExPO improves the implicit optimization objective of alignment training by simply amplifying the parameter change based on a first-order approximation, without any additional training overhead. Through controlled experiments, we demonstrate that ExPO boosts a DPO model trained with only 20% steps to outperform the fully-trained one. Moreover, we show…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 6Confidence 3

Strengths

+ The important task of efficiency of LLM alignment, and the interesting idea of extrapolating the LLM weights along the direction of the initial model toward the partially-trained one + Experiments are done on multimple open-source LLMs and bechmarks

Weaknesses

- Lack of enough theoretical analysis - Some parameters seem difficult to fix (e.g., \alpha) - English should be improved

Reviewer 02Rating 6Confidence 4

Strengths

1.The paper provides theoretical explanation with comprehensive experiment results to show the eﬀectiveness of EXPO. 2. The paper presents thorough experimental results that span various model architectures and alignment techniques, highlighting EXPO’s flexibility and robustness. 3. The results suggest that EXPO could be beneficial, especially for LLMs, as it provides a computationally economical pathway to enhance alignment.

Weaknesses

1. The core idea of EXPO seems to be a straightforward extension of existing model merge concept, primarily adjusting the interpolation parameter to a negative value (extrapolation). While the results are promising, this incremental shift from interpolation to extrapolation may be seen as lacking in true innovation. 2. Unlike interpolation, which typically operates within a bounded range [0,1], the extrapolation parameter α in EXPO operates within an open range [0, +∞). this open-ended range can

Reviewer 03Rating 3Confidence 4

Strengths

- The paper studies a method for improving reward-fine-tuned models. This is an important problem in current LLM research. - The paper is well-written and easy to understand. Figure 2 presents a clear indication of how the method works. Section 2 clearly presents the hypothesis, and the method is neatly summarized by equation 2. - The results presented are significant. Table 1 demonstrates significant gains in Win Rate using the proposed method. Table 5 demonstrates that the results extend to ot

Weaknesses

1) To me, there seems to be a disconnect between the theory presented in Section 2 and the results presented. The results seem to point towards different conclusions that contradict the theory. a) Line 192-193: Extrapolation strongly improves the results obtained by "DPO 100%". This is not predicted by the theory presented. The theory presented essentially states, "we can partially train an RLHF model, then predict the result of fully training. The resulting predicted model will achieve the ac

Reviewer 04Rating 3Confidence 4

Strengths

1. The paper tackles an important problem—reducing the computational cost of aligning large language models—which is crucial for scaling models efficiently. 2. The explanation of the method is clear and easy to follow, with helpful figures that enhance understanding. 3. The authors show EXPO’s ability to cut down alignment training costs, which supports their claim.

Weaknesses

1. **Limited theoretical foundation**: The paper presents an interesting empirical finding with E X PO, but lacks a rigorous theoretical analysis of why it works. The discussion in Section 3.3 on why E X PO can work is somewhat speculative. A more formal theoretical treatment, perhaps drawing connections to optimization theory or analyzing the loss landscape, would strengthen the paper's contribution. 2. **Limited analysis of failure cases**: While some negative results are reported (e.g. for K

Code & Models

Repositories

chujiezheng/llm-extrapolation
pytorchOfficial

Models

Videos

Model Extrapolation Expedites Alignment· underline

Taxonomy

TopicsNumerical Methods and Algorithms · Reservoir Engineering and Simulation Methods · Geophysics and Gravity Measurements

MethodsDirect Preference Optimization · Shrink and Fine-Tune · ALIGN