Generative Language Model for Catalyst Discovery
Dong Hyeon Mok, Seoin Back

TL;DR
This paper presents CatGPT, a transformer-based language model that generates inorganic catalyst structures, demonstrating its potential as a tool for accelerating catalyst discovery through generative modeling and fine-tuning.
Contribution
Introduction of CatGPT, a pretrained transformer model for inorganic catalyst generation, and its fine-tuning for specific catalyst design tasks.
Findings
High validity and accuracy in generated catalyst structures
Effective fine-tuning for specific catalytic reactions
Potential to accelerate catalyst discovery processes
Abstract
Discovery of novel and promising materials is a critical challenge in the field of chemistry and material science, traditionally approached through methodologies ranging from trial-and-error to machine learning-driven inverse design. Recent studies suggest that transformer-based language models can be utilized as material generative models to expand chemical space and explore materials with desired properties. In this work, we introduce the Catalyst Generative Pretrained Transformer (CatGPT), trained to generate string representations of inorganic catalyst structures from a vast chemical space. CatGPT not only demonstrates high performance in generating valid and accurate catalyst structures but also serves as a foundation model for generating desired types of catalysts by fine-tuning with sparse and specified datasets. As an example, we fine-tuned the pretrained CatGPT using a binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Materials Science · Data Quality and Management
MethodsAttention Is All You Need · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer
