1bit-Merging: Dynamic Quantized Merging for Large Language Models
Shuqi Liu, Yuxuan Yao, Bowei He, Zehua Liu, Xiongwei Han, Mingxuan Yuan, Han Wu, Linqi Song

TL;DR
1bit-Merging introduces a novel method combining task-specific routing with 1-bit quantization to efficiently merge large language models, balancing performance and storage savings across diverse tasks.
Contribution
The paper proposes 1bit-Merging, a new framework that integrates task-specific routing with 1-bit quantized vectors, enabling efficient model merging with minimal performance loss.
Findings
Achieves comparable or better performance than existing methods.
Significantly reduces storage requirements.
Effective across various model families and tasks.
Abstract
Recent advances in large language models have led to specialized models excelling in specific domains, creating a need for efficient model merging techniques. While traditional merging approaches combine parameters into a single static model, they often compromise task-specific performance. However, task-specific routing methods maintain accuracy but introduce substantial storage overhead. We present \texttt{1bit}-Merging, a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency. Our approach leverages the observation that different task-specific models store knowledge in distinct layers-chat models primarily in attention layers and math/code models in MLP layers, enabling targeted compression strategies. Through extensive experiments with LLaMA2 and Mistral model families across chat, mathematical reasoning,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Multi-Agent Systems and Negotiation
MethodsSoftmax · Attention Is All You Need
