E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

Wenjun Wang; Yanggan Gu; Shuo Cai; Yuanyi Wang; Pengkai Wang; Jianmin Wu; Hongxia Yang

arXiv:2605.16882·cs.CL·May 19, 2026

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

Wenjun Wang, Yanggan Gu, Shuo Cai, Yuanyi Wang, Pengkai Wang, Jianmin Wu, Hongxia Yang

PDF

1 Repo

TL;DR

E-PMQ introduces an expert-guided post-merge quantization framework that stabilizes and improves low-bit model deployment by leveraging source expert weights and merged-weight anchoring.

Contribution

The paper proposes E-PMQ, a novel method for post-merge quantization that enhances low-bit model accuracy by mitigating merging and quantization deviations using expert guidance and weight anchoring.

Findings

01

E-PMQ improves 4-bit GPTQ accuracy from 65.0% to 73.6% on CLIP-ViT-B/32 eight-task merging.

02

E-PMQ increases GPTQ accuracy from 34.8% to 76.7% on 20-task CLIP-ViT-L/14.

03

E-PMQ achieves 83.34% accuracy on FLAN-T5-base GLUE, outperforming prior methods.

Abstract

Low-resource deployment constraints have made model quantization essential for deploying neural networks while preserving performance. Meanwhile, model merging has become an increasingly practical low-resource strategy for integrating multiple task- or domain-specialized experts into a single model without joint training or multi-model serving. Together, quantization and model merging enable an efficient low-resource deployment pipeline by integrating multiple experts into one low-bit model. We formulate this setting as Post-Merge Quantization (PMQ). We show that directly applying post-training quantization (PTQ) to a merged model is unreliable because two distinct deviations are coupled: the quantization deviation introduced by low-bit reconstruction and the expert-relative merging deviation inherited from model merging. To mitigate these deviations, we propose E-PMQ, an expert-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wwjzhy/E-PMQ
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.