CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Tejas Srinivasan; Ting-Yun Chang; Leticia Leonor Pinto Alva; Georgios; Chochlakis; Mohammad Rostami; Jesse Thomason

arXiv:2206.09059·cs.CL·November 28, 2022·23 cites

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Tejas Srinivasan, Ting-Yun Chang, Leticia Leonor Pinto Alva, Georgios, Chochlakis, Mohammad Rostami, Jesse Thomason

PDF

Open Access 1 Repo 1 Video

TL;DR

CLiMB introduces a new benchmark for continual learning in vision-and-language tasks, highlighting the challenges of learning multiple modalities and the limitations of current CL methods in enabling knowledge transfer.

Contribution

The paper presents CLiMB, a comprehensive benchmark for multimodal continual learning, including implementations of CL algorithms and a modified Vision-Language Transformer model.

Findings

01

Common CL methods reduce forgetting in multimodal learning

02

Current CL methods do not support cross-task knowledge transfer

03

CLiMB enables systematic evaluation of multimodal continual learning

Abstract

Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to study the challenge of learning multimodal tasks in a CL setting, and to systematically evaluate how upstream continual learning can rapidly generalize to new multimodal and unimodal tasks. CLiMB includes implementations of several CL algorithms and a modified Vision-Language Transformer (ViLT) model that can be deployed on both multimodal and unimodal tasks. We find that common CL methods can help mitigate forgetting during multimodal task learning, but do not enable cross-task knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

glamor-usc/climb
pytorchOfficial

Videos

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Softmax · Dropout · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Byte Pair Encoding · Label Smoothing