I Have an Attention Bridge to Sell You: Generalization Capabilities of Modular Translation Architectures
Timothee Mickus, Ra\'ul V\'azquez, Joseph Attieh

TL;DR
This paper investigates whether modular translation architectures, specifically attention bridges, enhance generalization and translation quality, finding that non-modular models often perform better or equally well within the same computational constraints.
Contribution
The study provides a comprehensive comparison of modular and non-modular translation models, highlighting that modularity does not necessarily improve translation quality or generalization.
Findings
Non-modular architectures often outperform modular ones at the same computational budget.
Modular approaches do not significantly improve translation quality or generalization.
Non-modular models are generally comparable or preferable to modular designs.
Abstract
Modularity is a paradigm of machine translation with the potential of bringing forth models that are large at training time and small during inference. Within this field of study, modular approaches, and in particular attention bridges, have been argued to improve the generalization capabilities of models by fostering language-independent representations. In the present paper, we study whether modularity affects translation quality; as well as how well modular architectures generalize across different evaluation scenarios. For a given computational budget, we find non-modular architectures to be always comparable or preferable to all modular designs we study.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies
