LegoNN: Building Modular Encoder-Decoder Models
Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji, Watanabe, Florian Metze, Luke Zettlemoyer, and Abdelrahman Mohamed

TL;DR
LegoNN introduces a modular encoder-decoder framework enabling reuse and transfer of components across tasks like MT and ASR without fine-tuning, through a standardized interface grounded in marginal distributions.
Contribution
The paper presents LegoNN, a novel modular architecture for encoder-decoder models that allows component reuse across different tasks and languages without retraining.
Findings
Reused German-English decoder for ASR and Romanian-English MT without fine-tuning.
Fine-tuning LegoNN improves Romanian-English BLEU by 1.5 points.
Assembled a multi-module LegoNN ASR model achieving 19.5% WER reduction.
Abstract
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
