LegoNN: Building Modular Encoder-Decoder Models

Siddharth Dalmia; Dmytro Okhonko; Mike Lewis; Sergey Edunov; Shinji; Watanabe; Florian Metze; Luke Zettlemoyer; and Abdelrahman Mohamed

arXiv:2206.03318·cs.CL·July 12, 2023

LegoNN: Building Modular Encoder-Decoder Models

Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji, Watanabe, Florian Metze, Luke Zettlemoyer, and Abdelrahman Mohamed

PDF

Open Access

TL;DR

LegoNN introduces a modular encoder-decoder framework enabling reuse and transfer of components across tasks like MT and ASR without fine-tuning, through a standardized interface grounded in marginal distributions.

Contribution

The paper presents LegoNN, a novel modular architecture for encoder-decoder models that allows component reuse across different tasks and languages without retraining.

Findings

01

Reused German-English decoder for ASR and Romanian-English MT without fine-tuning.

02

Fine-tuning LegoNN improves Romanian-English BLEU by 1.5 points.

03

Assembled a multi-module LegoNN ASR model achieving 19.5% WER reduction.

Abstract

State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling