ReadMOF: Structure-Free Semantic Embeddings from Systematic MOF Nomenclature for Machine Learning
Kewei Zhu, Cameron Wilson, Bartosz Mazur, Yi Li, Ashleigh M. Chester, Peyman Z. Moghadam

TL;DR
ReadMOF introduces a novel language model-based approach to model MOF structures from systematic names, enabling property prediction and similarity analysis without atomic coordinates.
Contribution
It is the first framework to use systematic chemical names for structure-property modeling in materials science without relying on geometry or connectivity data.
Findings
Embeddings from ReadMOF match traditional structure-based descriptors in predictive tasks.
ReadMOF enables similarity retrieval and clustering of MOFs based on textual names.
Combining ReadMOF with large language models enhances chemical reasoning capabilities.
Abstract
Systematic chemical names, such as IUPAC-style nomenclature for metal-organic frameworks (MOFs), contain rich structural and compositional information in a standardized textual format. Here we introduce ReadMOF, which is, to our knowledge, the first nomenclature-free machine learning framework that leverages these names to model structure-property relationships without requiring atomic coordinates or connectivity graphs. By employing pretrained language models, ReadMOF converts systematic MOF names from the Cambridge Structural Database (CSD) into vector embeddings that closely represent traditional structure-based descriptors. These embeddings enable applications in materials informatics, including property prediction, similarity retrieval, and clustering, with performance comparable to geometry-dependent methods. When combined with large language models, ReadMOF also establishes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
