Derivative Free Weight-space Ensembling

Dean Ninalga

arXiv:2307.03506·cs.CL·July 27, 2023

Derivative Free Weight-space Ensembling

Dean Ninalga

PDF

Open Access

TL;DR

This paper introduces Derivative Free Weight-space Ensembling (DFWE), a novel method for combining multiple expert models through weight interpolation without gradients, improving task transfer in open-domain dialogue.

Contribution

The paper proposes a new gradient-free weight interpolation method for ensembling multiple models trained on different tasks, enhancing transfer learning capabilities.

Findings

01

DFWE outperforms standard pretrain-finetune methods on FETA-Friends.

02

The approach effectively combines knowledge from multiple models.

03

Gradient-free optimization efficiently finds optimal weight interpolations.

Abstract

Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications