From Patents to Dataset: Scraping for Oxide Glass Compositions and Properties
Gustavo Laranja Thomaello, Thomaz Yeiden Busnardo Aguena, Eric Trevelato Costa, Rafael Bas\'aglia Rosante, Thiago Rodrigo Ramos, Daiane Aparecida Zuanetti, and Edgar Dutra Zanotto

TL;DR
This paper develops web scraping methods to extract and structure glass composition and property data from patents, expanding existing databases and enabling improved machine learning models for glass development.
Contribution
It introduces novel web scraping techniques to extract diverse glass data from patents and integrates this data into existing databases for advanced modeling.
Findings
Increased data volume by approximately 10.4% for liquidus temperature.
Expanded compositional diversity with more titanium, magnesium, zirconium, and other oxides.
Enhanced database coverage for glass property prediction models.
Abstract
In this work, we present web scraping techniques to extract in- formation from patent tables, clean and structure them for future use in predictive machine learning models to develop new glasses. We extracted compositions and three properties relevant to the development of new glasses and structured them into a database to be used together with information from other available datasets. We also analyzed the consistency of the information obtained and what it adds to the existing databases. The extracted liquidus temperatures comprise 5,696 compositions; the second subset includes 4,298 refractive indexes and, finally, 1,771 compositions with Abbe numbers. The extraction performed here increases the available information by approximately 10.4% for liquidus temperature, 6.6% for refractive index, and 4.9% for Abbe number. The impact extends beyond quantity: the newly extracted data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · X-ray Diffraction in Crystallography · Inorganic Chemistry and Materials
