XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging   Upcycled Mixture-of-Experts

Yifeng Ding; Jiawei Liu; Yuxiang Wei; Terry Yue Zhuo; Lingming Zhang

arXiv:2404.15247·cs.CL·June 10, 2024

XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts

Yifeng Ding, Jiawei Liu, Yuxiang Wei, Terry Yue Zhuo, Lingming Zhang

PDF

Open Access 1 Repo

TL;DR

XFT is a novel training scheme that merges upcycled Mixture-of-Experts with a shared expert mechanism and model merging to significantly enhance instruction-tuned code LLMs, achieving state-of-the-art performance with minimal compute.

Contribution

XFT introduces a simple yet effective method to improve instruction tuning of code LLMs by merging upcycled MoE models with a shared expert mechanism and a learnable merging process.

Findings

01

Achieves state-of-the-art performance on HumanEval with a tiny 1.3B model.

02

Improves supervised fine-tuning results by 13% on HumanEval+.

03

Demonstrates consistent performance gains across multiple benchmarks.

Abstract

We introduce XFT, a simple yet powerful training scheme, by simply merging upcycled Mixture-of-Experts (MoE) to unleash the performance limit of instruction-tuned code Large Language Models (LLMs). While vanilla sparse upcycling fails to improve instruction tuning, XFT introduces a shared expert mechanism with a novel routing weight normalization strategy into sparse upcycling, which significantly boosts instruction tuning. After fine-tuning the upcycled MoE model, XFT introduces a learnable model merging mechanism to compile the upcycled MoE model back to a dense model, achieving upcycled MoE-level performance with only dense-model compute. By applying XFT to a 1.3B model, we create a new state-of-the-art tiny code LLM (<3B) with 67.1 and 64.6 pass@1 on HumanEval and HumanEval+ respectively. With the same data and model architecture, XFT improves supervised fine-tuning (SFT) by 13% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ise-uiuc/xft
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Machine Learning and Data Classification · Machine Learning and Algorithms

MethodsWeight Normalization · Mixture of Experts