ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya, Elad Venezian, Colin Raffel, Noam Slonim, Yoav, Katz, Leshem Choshen

TL;DR
ColD Fusion introduces a distributed, continual multitask finetuning paradigm that improves pretrained models without shared data, outperforming existing models like RoBERTa across diverse datasets.
Contribution
It presents a novel distributed multitask finetuning method that enhances pretrained models and enables continual improvement without shared data or extensive communication.
Findings
Achieves strong performance on multiple datasets.
Outperforms RoBERTa by 2.33 points on average.
Supports continual model evolution through recycling finetuned models.
Abstract
We propose a new paradigm to continually evolve pretrained models, denoted ColD Fusion. It provides the benefits of multitask learning but leverages distributed computation with limited communication and eliminates the need for shared data. Consequentially, ColD Fusion can give rise to a synergistic loop, where finetuned models can be recycled to continually improve the pretrained model they are based upon. We show that ColD Fusion yields comparable benefits to multitask training by producing a model that (a) attains strong performance on all of the datasets it was trained on; and (b) is a better starting point for finetuning on unseen datasets. We show that ColD Fusion outperforms RoBERTa and even previous multitask models. Specifically, when training and testing on 35 diverse datasets, ColD Fusion-based model outperforms RoBERTa by 2.33 points on average without any changes to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ibm-research/ColD-Fusionmodel· 110 dl· ♡ 12110 dl♡ 12
- 🤗ibm-research/ColD-Fusion-itr9-seed1model· 1 dl1 dl
- 🤗ibm-research/ColD-Fusion-itr9-seed2model· 2 dl2 dl
- 🤗ibm-research/ColD-Fusion-itr9-seed0model· 3 dl3 dl
- 🤗ibm-research/ColD-Fusion-itr9-seed3model· 1 dl1 dl
- 🤗ibm-research/ColD-Fusion-itr9-seed4model· 2 dl2 dl
- 🤗ibm-research/ColD-Fusion-itr10-seed1model· 3 dl3 dl
- 🤗ibm-research/ColD-Fusion-itr10-seed2model· 2 dl2 dl
- 🤗ibm-research/ColD-Fusion-itr10-seed0model· 2 dl2 dl
- 🤗ibm-research/ColD-Fusion-itr10-seed3model· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · AI in cancer detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Attention Dropout · Linear Warmup With Linear Decay · Layer Normalization · Softmax · Dropout
