Text to Insight: Accelerating Organic Materials Knowledge Extraction via Deep Learning
Xintong Zhao, Steven Lopez, Semion Saikin, Xiaohua Hu, Jane, Greenberg

TL;DR
This paper presents a deep learning framework for automated knowledge extraction from scientific literature on organic materials, significantly speeding up the research process by reducing manual effort.
Contribution
It introduces a novel dataset and a BiLSTM-CNN-CRF based NER model for extracting key information from organic materials literature, expanding computational methods beyond inorganic materials.
Findings
High potential for automated knowledge extraction demonstrated
A large annotated dataset was created for organic materials literature
Framework adaptable to other scientific domains
Abstract
Scientific literature is one of the most significant resources for sharing knowledge. Researchers turn to scientific literature as a first step in designing an experiment. Given the extensive and growing volume of literature, the common approach of reading and manually extracting knowledge is too time consuming, creating a bottleneck in the research cycle. This challenge spans nearly every scientific domain. For the materials science, experimental data distributed across millions of publications are extremely helpful for predicting materials properties and the design of novel materials. However, only recently researchers have explored computational approaches for knowledge extraction primarily for inorganic materials. This study aims to explore knowledge extraction for organic materials. We built a research dataset composed of 855 annotated and 708,376 unannotated sentences drawn from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Advanced Text Analysis Techniques
