MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Paul Pu Liang; Yiwei Lyu; Xiang Fan; Zetian Wu; Yun Cheng; Jason Wu,; Leslie Chen; Peter Wu; Michelle A. Lee; Yuke Zhu; Ruslan Salakhutdinov,; Louis-Philippe Morency

arXiv:2107.07502·cs.LG·November 11, 2021·22 cites

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu,, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov,, Louis-Philippe Morency

PDF

Open Access 3 Repos

TL;DR

MultiBench is a comprehensive benchmark suite for multimodal representation learning, covering diverse datasets, modalities, and tasks, aimed at advancing research in generalization, robustness, and efficiency.

Contribution

It introduces a large-scale, unified benchmark with standardized evaluation protocols and implementations, facilitating progress and reproducibility in multimodal learning research.

Findings

01

Applying existing methods improves performance on 9 out of 15 datasets.

02

MultiBench enables holistic evaluation of generalization, complexity, and robustness.

03

The benchmark highlights challenges like scalability and handling noisy or missing modalities.

Abstract

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning