Exceeding the Limits of Visual-Linguistic Multi-Task Learning
Cameron R. Wolfe, Keld T. Lundgaard

TL;DR
This paper demonstrates that large-scale multi-task learning with over 1000 tasks across e-commerce product data can surpass previous limits, using a multi-modal transformer and novel heuristics for task-specific capacity allocation.
Contribution
It introduces a scalable methodology for multi-task learning on thousands of tasks, including new heuristics like DyPa for efficient parameter allocation.
Findings
Successful training of a single model on 1000 tasks
Identification of best practices for large-scale MTL
Introduction of DyPa heuristic for task-specific capacity
Abstract
By leveraging large amounts of product data collected across hundreds of live e-commerce websites, we construct 1000 unique classification tasks that share similarly-structured input data, comprised of both text and images. These classification tasks focus on learning the product hierarchy of different e-commerce websites, causing many of them to be correlated. Adopting a multi-modal transformer model, we solve these tasks in unison using multi-task learning (MTL). Extensive experiments are presented over an initial 100-task dataset to reveal best practices for "large-scale MTL" (i.e., MTL with more than 100 tasks). From these experiments, a final, unified methodology is derived, which is composed of both best practices and new proposals such as DyPa, a simple heuristic for automatically allocating task-specific parameters to tasks that could benefit from extra capacity. Using our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
