MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu,, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov,, Louis-Philippe Morency

TL;DR
MultiBench is a comprehensive benchmark suite for multimodal representation learning, covering diverse datasets, modalities, and tasks, aimed at advancing research in generalization, robustness, and efficiency.
Contribution
It introduces a large-scale, unified benchmark with standardized evaluation protocols and implementations, facilitating progress and reproducibility in multimodal learning research.
Findings
Applying existing methods improves performance on 9 out of 15 datasets.
MultiBench enables holistic evaluation of generalization, complexity, and robustness.
The benchmark highlights challenges like scalability and handling noisy or missing modalities.
Abstract
Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
