StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation
Ranjith Merugu, Bryan Bo Cao, Shubham Jain

TL;DR
StatsMerging is a novel, lightweight model merging method that uses weight distribution statistics and task-specific teacher distillation to effectively combine models without ground truth labels, improving accuracy and robustness.
Contribution
It introduces a new merging approach leveraging SVD-based weight importance, a lightweight learner for distribution modeling, and task-specific teacher distillation for heterogeneous models.
Findings
Outperforms state-of-the-art methods in accuracy
Demonstrates strong generalization to unseen tasks
Shows robustness to image quality variations
Abstract
Model merging has emerged as a promising solution to accommodate multiple large models within constrained memory budgets. We present StatsMerging, a novel lightweight learning-based model merging method guided by weight distribution statistics without requiring ground truth labels or test samples. StatsMerging offers three key advantages: (1) It uniquely leverages singular values from singular value decomposition (SVD) to capture task-specific weight distributions, serving as a proxy for task importance to guide task coefficient prediction; (2) It employs a lightweight learner StatsMergeLearner to model the weight distributions of task-specific pre-trained models, improving generalization and enhancing adaptation to unseen samples; (3) It introduces Task-Specific Teacher Distillation for merging vision models with heterogeneous architectures, a merging learning paradigm that avoids…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
