PcMSP: A Dataset for Scientific Action Graphs Extraction from Polycrystalline Materials Synthesis Procedure Text
Xianjun Yang, Ya Zhuo, Julia Zuo, Xinlu Zhang, Stephen Wilson, Linda, Petzold

TL;DR
This paper introduces PcMSP, a new high-quality dataset of annotated scientific texts on polycrystalline materials synthesis, enabling improved information extraction and natural language processing tasks in materials science.
Contribution
It provides the first annotated dataset for synthesis procedures in materials science, along with a comprehensive annotation scheme and baseline NLP task evaluations.
Findings
State-of-the-art models perform well but still have significant room for improvement.
High inter-annotator agreement indicates dataset quality.
Error analysis reveals unique challenges in scientific text extraction.
Abstract
Scientific action graphs extraction from materials synthesis procedures is important for reproducible research, machine automation, and material prediction. But the lack of annotated data has hindered progress in this field. We demonstrate an effort to annotate Polycrystalline Materials Synthesis Procedures (PcMSP) from 305 open access scientific articles for the construction of synthesis action graphs. This is a new dataset for material science information extraction that simultaneously contains the synthesis sentences extracted from the experimental paragraphs, as well as the entity mentions and intra-sentence relations. A two-step human annotation and inter-annotator agreement study guarantee the high quality of the PcMSP corpus. We introduce four natural language processing tasks: sentence classification, named entity recognition, relation classification, and joint extraction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Software Engineering Research
