CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without   Full Large Language Model

Kaiyan Zhang; Ning Ding; Biqing Qi; Xuekai Zhu; Xinwei Long; Bowen; Zhou

arXiv:2310.15477·cs.CL·October 25, 2023·1 cites

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, Bowen, Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces CRaSh, a training-free method that enhances offsite-tuning of large language models by exploiting modular structures, leading to improved emulator performance without full model fine-tuning.

Contribution

It reveals a modular layer structure in LLMs and proposes CRaSh, a novel strategy for better offsite-tuning performance without requiring full model training.

Findings

01

Modular structure emerges as model size increases.

02

CRaSh significantly improves offsite-tuning performance.

03

Optima from fine-tuning with and without full models are linearly connected.

Abstract

Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tsinghuac3i/crash
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques