Matter-of-Fact: A Benchmark for Verifying the Feasibility of Literature-Supported Claims in Materials Science
Peter Jansen, Samiah Hassan, Ruoyao Wang

TL;DR
Matter-of-Fact introduces a challenging benchmark dataset for assessing the feasibility of scientific claims in materials science, aiming to improve automated hypothesis filtering for scalable scientific discovery.
Contribution
The paper presents a new dataset and task for verifying the feasibility of literature-supported claims, highlighting current model limitations and potential for accelerating materials science research.
Findings
Strong baselines achieve up to 72% accuracy, still below human performance.
Current models struggle with the task, indicating room for improvement.
Nearly all claims are solvable by domain experts, showing the dataset's relevance.
Abstract
Contemporary approaches to assisted scientific discovery use language models to automatically generate large numbers of potential hypothesis to test, while also automatically generating code-based experiments to test those hypotheses. While hypotheses can be comparatively inexpensive to generate, automated experiments can be costly, particularly when run at scale (i.e. thousands of experiments). Developing the capacity to filter hypotheses based on their feasibility would allow discovery systems to run at scale, while increasing their likelihood of making significant discoveries. In this work we introduce Matter-of-Fact, a challenge dataset for determining the feasibility of hypotheses framed as claims, while operationalizing feasibility assessment as a temporally-filtered claim verification task using backtesting. Matter-of-Fact includes 8.4k claims extracted from scientific articles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Scientific Computing and Data Management · Intellectual Property and Patents
