How to Weight Multitask Finetuning? Fast Previews via Bayesian Model-Merging
Hugo Monz\'on Maldonado, Thomas M\"ollenhoff, Nico Daheim, Iryna, Gurevych, Mohammad Emtiyaz Khan

TL;DR
This paper introduces a Bayesian model-merging approach to efficiently estimate task weights in multitask finetuning, enabling quick previews and improved performance without retraining.
Contribution
It proposes a Bayesian model-merging method to generate fast, accurate previews for task weighting in multitask finetuning, reducing search costs.
Findings
Bayesian merging improves preview quality over simple averaging.
Previews guide better task weighting in multitask models.
Method validated on vision and NLP transformer models.
Abstract
When finetuning multiple tasks altogether, it is important to carefully weigh them to get a good performance, but searching for good weights can be difficult and costly. Here, we propose to aid the search with fast previews to quickly get a rough idea of different reweighting options. We use model merging to create previews by simply reusing and averaging parameters of models trained on each task separately (no retraining required). To improve the quality of previews, we propose a Bayesian approach to design new merging strategies by using more flexible posteriors. We validate our findings on vision and natural-language transformers. Our work shows the benefits of model merging via Bayes to improve multitask finetuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques
