One Model To Learn Them All
Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki, Parmar, Llion Jones, Jakob Uszkoreit

TL;DR
This paper introduces a versatile single deep learning model capable of handling multiple diverse tasks across domains, reducing the need for task-specific architectures and tuning.
Contribution
The authors propose a unified model architecture combining various computational blocks, trained concurrently on multiple tasks, demonstrating broad applicability and improved performance through joint training.
Findings
Multi-task training benefits low-data tasks significantly.
Adding diverse blocks never harms and often improves performance.
The model performs well across vision, language, and speech tasks.
Abstract
Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
