Derivative Free Weight-space Ensembling
Dean Ninalga

TL;DR
This paper introduces Derivative Free Weight-space Ensembling (DFWE), a novel method for combining multiple expert models through weight interpolation without gradients, improving task transfer in open-domain dialogue.
Contribution
The paper proposes a new gradient-free weight interpolation method for ensembling multiple models trained on different tasks, enhancing transfer learning capabilities.
Findings
DFWE outperforms standard pretrain-finetune methods on FETA-Friends.
The approach effectively combines knowledge from multiple models.
Gradient-free optimization efficiently finds optimal weight interpolations.
Abstract
Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
