UniT: Multimodal Multitask Learning with a Unified Transformer

Ronghang Hu; Amanpreet Singh

arXiv:2102.10772·cs.CV·August 19, 2021·5 cites

UniT: Multimodal Multitask Learning with a Unified Transformer

Ronghang Hu, Amanpreet Singh

PDF

Open Access 1 Repo

TL;DR

UniT introduces a unified transformer model capable of multitask learning across diverse domains like object detection and natural language understanding, achieving strong performance with fewer parameters by sharing a single model for multiple tasks.

Contribution

The paper presents UniT, a novel unified transformer architecture that jointly learns multiple tasks across different domains with shared parameters, unlike previous models that fine-tune task-specific transformers.

Findings

01

Achieves strong performance on 7 tasks across 8 datasets.

02

Uses significantly fewer parameters than task-specific models.

03

Successfully handles diverse multimodal and multitask learning scenarios.

Abstract

We propose UniT, a Unified Transformer model to simultaneously learn the most prominent tasks across different domains, ranging from object detection to natural language understanding and multimodal reasoning. Based on the transformer encoder-decoder architecture, our UniT model encodes each input modality with an encoder and makes predictions on each task with a shared decoder over the encoded input representations, followed by task-specific output heads. The entire model is jointly trained end-to-end with losses from each task. Compared to previous efforts on multi-task learning with transformers, we share the same model parameters across all tasks instead of separately fine-tuning task-specific models and handle a much higher variety of tasks across different domains. In our experiments, we learn 7 tasks jointly over 8 datasets, achieving strong performance on each task with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/mmf/tree/main/projects/unit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Dropout · Layer Normalization · Attention Is All You Need · Dense Connections · Softmax · Adam