Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Alan Ansell; Edoardo Maria Ponti; Anna Korhonen; Ivan Vuli\'c

arXiv:2110.07560·cs.CL·February 10, 2023·1 cites

Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Alan Ansell, Edoardo Maria Ponti, Anna Korhonen, Ivan Vuli\'c

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel sparse fine-tuning method that combines the benefits of adapters and sparse control, enabling efficient, composable, and effective cross-lingual transfer without increasing model size.

Contribution

It proposes a new sparse, composable fine-tuning technique based on the Lottery Ticket Hypothesis, outperforming adapters in zero-shot cross-lingual transfer without adding parameters.

Findings

01

Outperforms adapters in zero-shot cross-lingual transfer benchmarks

02

Sparsity prevents interference and overfitting in composable fine-tuning

03

Method does not increase parameters or alter model architecture

Abstract

Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques