Probing the limitations of multimodal language models for chemistry and materials research
Nawaf Alampara, Mara Schilling-Wilhelmi, Marti\~no R\'ios-Garc\'ia,, Indrajeet Mandal, Pranav Khetarpal, Hargun Singh Grover, N. M. Anoop, Krishnan, Kevin Maik Jablonka

TL;DR
This paper evaluates vision-language models in chemistry and materials science, revealing strengths in perception but significant limitations in reasoning and inference, highlighting areas for future improvement.
Contribution
Introduces MaCBench, a benchmark for assessing multimodal models in scientific tasks, and systematically evaluates leading models' capabilities and limitations.
Findings
High accuracy in equipment identification and data extraction
Fundamental challenges in spatial reasoning and multi-step inference
Implications for developing reliable scientific AI assistants
Abstract
Recent advancements in artificial intelligence have sparked interest in scientific assistants that could support researchers across the full spectrum of scientific workflows, from literature review to experimental design and data analysis. A key capability for such systems is the ability to process and reason about scientific information in both visual and textual forms - from interpreting spectroscopic data to understanding laboratory setups. Here, we introduce MaCBench, a comprehensive benchmark for evaluating how vision-language models handle real-world chemistry and materials science tasks across three core aspects: data extraction, experimental understanding, and results interpretation. Through a systematic evaluation of leading models, we find that while these systems show promising capabilities in basic perception tasks - achieving near-perfect performance in equipment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · linguistics and terminology studies · Natural Language Processing Techniques
