MAGMA -- Multimodal Augmentation of Generative Models through   Adapter-based Finetuning

Constantin Eichenberg; Sidney Black; Samuel Weinbach; Letitia; Parcalabescu; Anette Frank

arXiv:2112.05253·cs.CV·October 26, 2022·5 cites

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Constantin Eichenberg, Sidney Black, Samuel Weinbach, Letitia, Parcalabescu, Anette Frank

PDF

Open Access 1 Repo 2 Models

TL;DR

MAGMA introduces a simple, adapter-based finetuning method that enhances generative language models with multimodal capabilities, achieving state-of-the-art results with minimal pretraining data and preserving language knowledge.

Contribution

It presents a novel end-to-end multimodal finetuning approach that maintains language model weights, enabling efficient training and transfer of pretraining knowledge.

Findings

01

Outperforms Frozen on open-ended generative tasks

02

Achieves state-of-the-art on OKVQA benchmark

03

Requires significantly less pretraining data

Abstract

Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Aleph-Alpha/magma
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsSimple Visual Language Model