Analyzing Research Trends in Inorganic Materials Literature Using NLP
Fusataka Kuniyoshi, Jun Ozawa, Makoto Miwa

TL;DR
This paper develops an NLP pipeline to extract material names and properties from inorganic materials literature, enabling trend analysis and knowledge retrieval in materials science research.
Contribution
It introduces a new annotated corpus and a named entity recognition model for extracting key information from materials science papers, facilitating large-scale literature analysis.
Findings
NER model achieves 78.1% micro-F1 score
Demonstrates trend analysis of materials research over years
Shows increasing research interest in MoS2 in China
Abstract
In the field of inorganic materials science, there is a growing demand to extract knowledge such as physical properties and synthesis processes of materials by machine-reading a large number of papers. This is because materials researchers refer to many papers in order to come up with promising terms of experiments for material synthesis. However, there are only a few systems that can extract material names and their properties. This study proposes a large-scale natural language processing (NLP) pipeline for extracting material names and properties from materials science literature to enable the search and retrieval of results in materials science. Therefore, we propose a label definition for extracting material names and properties and accordingly build a corpus containing 836 annotated paragraphs extracted from 301 papers for training a named entity recognition (NER) model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Data Quality and Management
