EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

Song Guo; Fan Wu; Lei Zhang; Xiawu Zheng; Shengchuan Zhang; Fei Chao,; Yiyu Shi; Rongrong Ji

arXiv:2402.12419·cs.LG·February 21, 2024·1 cites

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao,, Yiyu Shi, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

EBFT is a novel, efficient block-wise fine-tuning framework for sparse LLMs that minimizes reconstruction error, achieving superior performance with reduced resource consumption and fast training times.

Contribution

The paper introduces EBFT, a new method for fine-tuning sparse LLMs that optimizes block-wise reconstruction error, outperforming existing methods in accuracy and efficiency.

Findings

01

Achieves perplexity of 16.88 on Wikitext2 with LlamaV1-7B at 70% sparsity.

02

Outperforms state-of-the-art methods like DSnoT and LoRA in perplexity.

03

Fine-tuning takes approximately 30 minutes on a single 16GB GPU.

Abstract

Existing methods for fine-tuning sparse LLMs often suffer from resource-intensive requirements and high retraining costs. Additionally, many fine-tuning methods often rely on approximations or heuristic optimization strategies, which may lead to suboptimal solutions. To address these issues, we propose an efficient and fast framework for fine-tuning sparse LLMs based on minimizing reconstruction error. Our approach involves sampling a small dataset for calibration and utilizing backpropagation to iteratively optimize block-wise reconstruction error, on a block-by-block basis, aiming for optimal solutions. Extensive experiments on various benchmarks consistently demonstrate the superiority of our method over other baselines. For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16.88, surpassing the state-of-the-art DSnoT with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunggo/ebft
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Packet Processing and Optimization · Natural Language Processing Techniques