Automatic extraction of materials and properties from superconductors   scientific literature

Luca Foppiano; Pedro Baptista de Castro; Pedro Ortiz Suarez; Kensei; Terashima; Yoshihiko Takano; Masashi Ishii

arXiv:2210.15600·cs.CL·November 27, 2023

Automatic extraction of materials and properties from superconductors scientific literature

Luca Foppiano, Pedro Baptista de Castro, Pedro Ortiz Suarez, Kensei, Terashima, Yoshihiko Takano, Masashi Ishii

PDF

2 Repos

TL;DR

This paper presents Grobid-superconductors, a tool that automatically extracts superconductor materials and their properties from scientific literature, enabling the creation of a large, structured database for materials informatics.

Contribution

We developed Grobid-superconductors, a novel machine learning and heuristic-based system for extracting superconductor data from texts and PDFs, and built SuperCon2 database from 37,700 papers.

Findings

01

Successfully extracted 40,324 materials and properties records.

02

Achieved high accuracy in identifying material names and properties.

03

Enabled large-scale data collection for superconductors.

Abstract

The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Residual Connection · Weight Decay · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Adam · Dropout · Softmax · Dense Connections