BERTGEN: Multi-task Generation through BERT

Faidon Mitzalis; Ozan Caglayan; Pranava Madhyastha; Lucia Specia

arXiv:2106.03484·cs.CL·June 8, 2021

BERTGEN: Multi-task Generation through BERT

Faidon Mitzalis, Ozan Caglayan, Pranava Madhyastha, Lucia Specia

PDF

1 Repo

TL;DR

BERTGEN is a multi-task, decoder-only generative model that combines multimodal and multilingual pretraining to excel in image captioning, translation, and zero-shot language generation, outperforming many baselines.

Contribution

It introduces BERTGEN, a novel model that fuses VL-BERT and M-BERT for multi-task generation, demonstrating effective transfer and zero-shot capabilities.

Findings

01

Outperforms strong baselines across tasks

02

Shows competitive zero-shot language generation

03

Benefits significantly from multi-task training

Abstract

We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT, respectively. BERTGEN is auto-regressively trained for language generation tasks, namely image captioning, machine translation and multimodal machine translation, under a multitask setting. With a comprehensive set of evaluations, we show that BERTGEN outperforms many strong baselines across the tasks explored. We also show BERTGEN's ability for zero-shot language generation, where it exhibits competitive performance to supervised counterparts. Finally, we conduct ablation studies which demonstrate that BERTGEN substantially benefits from multi-tasking and effectively transfers relevant inductive biases from the pre-trained models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ImperialNLP/BertGen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Visual-Linguistic BERT · Adam · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout