Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via   Alignment Tax Reduction

Tingchen Fu; Deng Cai; Lemao Liu; Shuming Shi; Rui Yan

arXiv:2405.13432·cs.CL·May 24, 2024

Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a disperse-then-merge framework for instruction tuning of large language models, reducing alignment tax and improving performance on knowledge and reasoning benchmarks by training sub-models on data portions and merging them.

Contribution

The paper proposes a novel disperse-then-merge approach that outperforms existing methods in instruction tuning by addressing data bias and alignment tax issues.

Findings

01

Outperforms data curation and regularization methods

02

Reduces alignment tax during instruction tuning

03

Improves knowledge and reasoning benchmark scores

Abstract

Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases are probably one cause behind the phenomenon. To address the issue, we introduce a simple disperse-then-merge framework. To be concrete, we disperse the instruction-following data into portions and train multiple sub-models using different data portions. Then we merge multiple models into a single one via model merging techniques. Despite its simplicity, our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TingchenFu/ACL24-ExpertFusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsICT Impact and Policies

MethodsShrink and Fine-Tune