MatViX: Multimodal Information Extraction from Visually Rich Articles
Ghazal Khalighinejad, Sharon Scott, Ollie Liu, Kelly L. Anderson,, Rickard Stureborg, Aman Tyagi, Bhuwan Dhingra

TL;DR
MatViX is a new benchmark dataset and evaluation framework for multimodal information extraction from scientific articles, focusing on text, figures, and tables to facilitate materials science research.
Contribution
We introduce MatViX, a comprehensive multimodal dataset with structured JSON annotations and an evaluation method, benchmarking vision-language models for scientific literature extraction.
Findings
Vision-language models show room for improvement in multimodal scientific information extraction.
Using specialized models like DePlot enhances curve extraction performance.
The dataset and evaluation tools are publicly available for future research.
Abstract
Multimodal information extraction (MIE) is crucial for scientific literature, where valuable data is often spread across text, figures, and tables. In materials science, extracting structured information from research articles can accelerate the discovery of new materials. However, the multimodal nature and complex interconnections of scientific content present challenges for traditional text-based methods. We introduce \textsc{MatViX}, a benchmark consisting of full-length research articles and complex structured JSON files, carefully curated by domain experts. These JSON files are extracted from text, tables, and figures in full-length documents, providing a comprehensive challenge for MIE. We introduce an evaluation method to assess the accuracy of curve similarity and the alignment of hierarchical structures. Additionally, we benchmark vision-language models (VLMs) in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
