Localizing Task Information for Improved Model Merging and Compression

Ke Wang; Nikolaos Dimitriadis; Guillermo Ortiz-Jimenez; Fran\c{c}ois; Fleuret; Pascal Frossard

arXiv:2405.07813·cs.LG·May 14, 2024

Localizing Task Information for Improved Model Merging and Compression

Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez, Fran\c{c}ois, Fleuret, Pascal Frossard

PDF

Open Access 2 Repos 1 Models

TL;DR

This paper introduces TALL-masks and Consensus Merging to improve multi-task model merging by identifying task-specific weights and removing detrimental ones, achieving high accuracy retention and significant compression.

Contribution

The paper presents TALL-masks for task support identification and Consensus Merging to eliminate selfish weights, enhancing model merging and compression in multi-task learning.

Findings

01

Achieves over 99% accuracy retention using task masks.

02

Reduces storage from 57GB to 8.2GB with minimal performance loss.

03

Improves existing model merging methods across vision and NLP tasks.

Abstract

Model merging and task arithmetic have emerged as promising scalable approaches to merge multiple single-task checkpoints to one multi-task model, but their applicability is reduced by significant performance loss. Previous works have linked these drops to interference in the weight space and erasure of important task-specific features. Instead, in this work we show that the information required to solve each task is still preserved after merging as different tasks mostly use non-overlapping sets of weights. We propose TALL-masks, a method to identify these task supports given a collection of task vectors and show that one can retrieve >99% of the single task accuracy by applying our masks to the multi-task vector, effectively compressing the individual checkpoints. We study the statistics of intersections among constructed masks and reveal the existence of selfish and catastrophic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
nik-dim/tall_masks
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Data Compression Techniques · Parallel Computing and Optimization Techniques