Extracting ORR Catalyst Information for Fuel Cell from Scientific Literature
Hein Htet, Amgad Ahmed Ali Ibrahim, Yutaka Sasaki, Ryoji Asahi

TL;DR
This paper develops a transformer-based information extraction method to automatically identify and relate ORR catalyst information from scientific literature, improving data collection for fuel cell research.
Contribution
It introduces a novel dataset and fine-tuned BERT models for extracting ORR catalyst entities and relations, demonstrating improved accuracy over general models.
Findings
PubMedBERT achieves 82.19% NER F1-score.
MatSciBERT attains 66.10% relation extraction F1-score.
Domain-specific BERT models outperform general scientific models.
Abstract
The oxygen reduction reaction (ORR) catalyst plays a critical role in enhancing fuel cell efficiency, making it a key focus in material science research. However, extracting structured information about ORR catalysts from vast scientific literature remains a significant challenge due to the complexity and diversity of textual data. In this study, we propose a named entity recognition (NER) and relation extraction (RE) approach using DyGIE++ with multiple pre-trained BERT variants, including MatSciBERT and PubMedBERT, to extract ORR catalyst-related information from the scientific literature, which is compiled into a fuel cell corpus for materials informatics (FC-CoMIcs). A comprehensive dataset was constructed manually by identifying 12 critical entities and two relationship types between pairs of the entities. Our methodology involves data annotation, integration, and fine-tuning of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Electrocatalysts for Energy Conversion · Topic Modeling
