TL;DR
This paper introduces MULTIBENCH++, a large-scale, unified benchmark with over 30 datasets across 15 modalities for evaluating multimodal fusion methods, addressing evaluation biases and fostering fair comparisons.
Contribution
It presents a comprehensive, domain-adaptive benchmark and an open-source evaluation pipeline, enabling rigorous, reproducible assessment of multimodal fusion models.
Findings
Established new performance baselines across multiple tasks.
Demonstrated the effectiveness of the benchmark in evaluating diverse fusion methods.
Abstract
Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
