LARV: Data-Free Layer-wise Adaptive Rescaling Veneer for Model Merging
Xinyu Wang, Ke Deng, Fei Dou, Jinbo Bi, Jin Lu

TL;DR
LARV is a novel, data-free method that adaptively rescales layer-wise features in vision transformer model merging, significantly improving performance by suppressing shallow-layer interference and enhancing deep-layer stability.
Contribution
It introduces the first layer-aware scaling technique for task-vector merging that boosts existing merging rules without retraining or data access.
Findings
LARV improves merging performance across multiple benchmarks.
It effectively suppresses shallow-layer interference.
LARV enhances robustness against data corruption.
Abstract
Model merging aims to combine multiple fine-tuned models into a single multi-task model without access to training data. Existing task-vector merging methods such as TIES, TSV-M, and Iso-C/CTS differ in their aggregation rules but treat all layers nearly uniformly. This assumption overlooks the strong layer-wise heterogeneity in large vision transformers, where shallow layers are sensitive to interference while deeper layers encode stable task-specific features. We introduce LARV, a training-free, data-free, merger-agnostic Layer-wise Adaptive Rescaling Veneer that plugs into any task-vector merger and assigns a per-layer scale to each task vector before aggregation, and show it consistently boosts diverse merging rules. LARV adaptively suppresses shallow-layer interference and amplifies deeper-layer alignment using a simple deterministic schedule, requiring no retraining or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Domain Adaptation and Few-Shot Learning
