Question Answering models for information extraction from perovskite materials science literature
M. Sipil\"a, F. Mehryary, S. Pyysalo, F. Ginter, Milica Todorovi\'c

TL;DR
This paper introduces a QA-based approach for extracting material-property relationships from scientific literature, demonstrating improved accuracy in identifying perovskite bandgaps and highlighting the method's potential for materials discovery.
Contribution
The study presents a novel QA workflow using fine-tuned large language models, achieving state-of-the-art accuracy in extracting material properties from scientific texts.
Findings
QA MatBERT achieved highest extraction accuracy
F1-scores surpassed current state-of-the-art
QA approach shows high versatility and potential for materials research
Abstract
Scientific text is a promising source of data in materials science, with ongoing research into utilising textual data for materials discovery. In this study, we developed and tested a novel approach to extract material-property relationships from scientific publications using the Question Answering (QA) method. QA performance was evaluated for information extraction of perovskite bandgaps based on a human query. We observed considerable variation in results with five different large language models fine-tuned for the QA task. Best extraction accuracy was achieved with the QA MatBERT and F1-scores improved on the current state-of-the-art. This work demonstrates the QA workflow and paves the way towards further applications. The simplicity, versatility and accuracy of the QA approach all point to its considerable potential for text-driven discoveries in materials research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems
