The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain
Annemarie Friedrich, Heike Adel, Federico Tomazic, Johannes, Hingerl, Renou Benteau, Anika Maruscyk, Lukas Lange

TL;DR
This paper introduces a new annotated corpus and neural models for extracting experimental information from materials science literature, specifically focusing on solid oxide fuel cells, to advance automated information extraction in this domain.
Contribution
It provides a novel annotation scheme, a high-quality annotated corpus, and neural models that significantly improve information extraction performance in materials science publications.
Findings
BERT embeddings significantly improve model performance
Adding a recurrent layer benefits complex tasks
High inter-annotator agreement indicates annotation quality
Abstract
This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Weight Decay · Softmax · Adam · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections
