1bit-Merging: Dynamic Quantized Merging for Large Language Models

Shuqi Liu; Yuxuan Yao; Bowei He; Zehua Liu; Xiongwei Han; Mingxuan Yuan; Han Wu; Linqi Song

arXiv:2502.10743·cs.CL·May 26, 2025

1bit-Merging: Dynamic Quantized Merging for Large Language Models

Shuqi Liu, Yuxuan Yao, Bowei He, Zehua Liu, Xiongwei Han, Mingxuan Yuan, Han Wu, Linqi Song

PDF

Open Access

TL;DR

1bit-Merging introduces a novel method combining task-specific routing with 1-bit quantization to efficiently merge large language models, balancing performance and storage savings across diverse tasks.

Contribution

The paper proposes 1bit-Merging, a new framework that integrates task-specific routing with 1-bit quantized vectors, enabling efficient model merging with minimal performance loss.

Findings

01

Achieves comparable or better performance than existing methods.

02

Significantly reduces storage requirements.

03

Effective across various model families and tasks.

Abstract

Recent advances in large language models have led to specialized models excelling in specific domains, creating a need for efficient model merging techniques. While traditional merging approaches combine parameters into a single static model, they often compromise task-specific performance. However, task-specific routing methods maintain accuracy but introduce substantial storage overhead. We present \texttt{1bit}-Merging, a novel framework that integrates task-specific routing with 1-bit quantized task vectors to balance performance and storage efficiency. Our approach leverages the observation that different task-specific models store knowledge in distinct layers-chat models primarily in attention layers and math/code models in MLP layers, enabling targeted compression strategies. Through extensive experiments with LLaMA2 and Mistral model families across chat, mathematical reasoning,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Semantic Web and Ontologies · Multi-Agent Systems and Negotiation

MethodsSoftmax · Attention Is All You Need