Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing
Vineeth Venugopal, Sourav Sahoo, Mohd Zaki, Manish Agarwal, Nitya Nand, Gosvami, N. M. Anoop Krishnan

TL;DR
This paper introduces a natural language processing framework that automates the extraction and classification of knowledge from materials science literature, including text, images, and chemical data, to facilitate materials discovery.
Contribution
It presents a novel integrated approach combining text classification, image summarization, and chemical element mapping for materials science literature analysis.
Findings
Automated categorization of abstracts using LDA.
Summarization of images with Caption Cluster Plot.
Chemical element distribution mapping in literature.
Abstract
Most of the knowledge in materials science literature is in the form of unstructured data such as text and images. Here, we present a framework employing natural language processing, which automates text and image comprehension and precision knowledge extraction from inorganic glasses' literature. The abstracts are automatically categorized using latent Dirichlet allocation (LDA), providing a way to classify and search semantically linked publications. Similarly, a comprehensive summary of images and plots are presented using the 'Caption Cluster Plot' (CCP), which provides direct access to the images buried in the papers. Finally, we combine the LDA and CCP with the chemical elements occurring in the manuscript to present an 'Elemental map', a topical and image-wise distribution of chemical elements in the literature. Overall, the framework presented here can be a generic and powerful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
