MuSciClaims: Multimodal Scientific Claim Verification
Yash Kumar Lal, Manikanta Bandham, Mohammad Saqib Hasan, Apoorva Kashi, Mahnaz Koupaee, Niranjan Balasubramanian

TL;DR
MuSciClaims introduces a new benchmark for scientific claim verification using multimodal data, revealing current models' poor performance and highlighting key challenges in evidence localization and multimodal reasoning.
Contribution
The paper presents MuSciClaims, a novel benchmark with diagnostic tasks for evaluating multimodal scientific claim verification models, including automatically extracted and manually perturbed claims.
Findings
Most vision-language models perform poorly (~0.3-0.5 F1)
Even the best model achieves only 0.72 F1
Models struggle with evidence localization and multimodal reasoning
Abstract
Assessing scientific claims requires identifying, extracting, and reasoning with multimodal data expressed in information-rich figures in scientific literature. Despite the large body of work in scientific QA, figure captioning, and other multimodal reasoning tasks over chart-based data, there are no readily usable multimodal benchmarks that directly test claim verification abilities. To remedy this gap, we introduce a new benchmark MuSciClaims accompanied by diagnostics tasks. We automatically extract supported claims from scientific articles, which we manually perturb to produce contradicted claims. The perturbations are designed to test for a specific set of claim verification capabilities. We also introduce a suite of diagnostic tasks that help understand model failures. Our results show most vision-language models are poor (~0.3-0.5 F1), with even the best model only achieving 0.72…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Biomedical Text Mining and Ontologies
MethodsSparse Evolutionary Training
