How to Weight Multitask Finetuning? Fast Previews via Bayesian   Model-Merging

Hugo Monz\'on Maldonado; Thomas M\"ollenhoff; Nico Daheim; Iryna; Gurevych; Mohammad Emtiyaz Khan

arXiv:2412.08147·cs.LG·December 12, 2024

How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging

Hugo Monz\'on Maldonado, Thomas M\"ollenhoff, Nico Daheim, Iryna, Gurevych, Mohammad Emtiyaz Khan

PDF

Open Access

TL;DR

This paper introduces a Bayesian model-merging approach to efficiently estimate task weights in multitask finetuning, enabling quick previews and improved performance without retraining.

Contribution

It proposes a Bayesian model-merging method to generate fast, accurate previews for task weighting in multitask finetuning, reducing search costs.

Findings

01

Bayesian merging improves preview quality over simple averaging.

02

Previews guide better task weighting in multitask models.

03

Method validated on vision and NLP transformer models.

Abstract

When finetuning multiple tasks altogether, it is important to carefully weigh them to get a good performance, but searching for good weights can be difficult and costly. Here, we propose to aid the search with fast previews to quickly get a rough idea of different reweighting options. We use model merging to create previews by simply reusing and averaging parameters of models trained on each task separately (no retraining required). To improve the quality of previews, we propose a Bayesian approach to design new merging strategies by using more flexible posteriors. We validate our findings on vision and natural-language transformers. Our work shows the benefits of model merging via Bayes to improve multitask finetuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques