Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models
Janghoon Ock, Chakradhar Guntuboina, Amir Barati Farimani

TL;DR
This paper introduces CatBERTa, a Transformer-based model that predicts catalyst adsorption energy from textual descriptions, offering comparable accuracy to GNNs and insights into feature importance, without needing atomic coordinates.
Contribution
The study presents a novel text-based approach for catalyst property prediction using a pretrained Transformer, bypassing the need for graph representations and atomic coordinate data.
Findings
CatBERTa achieves an MAE of 0.75 eV, comparable to GNNs.
Subtracting CatBERTa predictions reduces systematic errors by up to 19.3%.
Attention analysis reveals focus on adsorbates and bulk composition tokens.
Abstract
Efficient catalyst screening necessitates predictive models for adsorption energy, a key property of reactivity. However, prevailing methods, notably graph neural networks (GNNs), demand precise atomic coordinates for constructing graph representations, while integrating observable attributes remains challenging. This research introduces CatBERTa, an energy prediction Transformer model using textual inputs. Built on a pretrained Transformer encoder, CatBERTa processes human-interpretable text, incorporating target features. Attention score analysis reveals CatBERTa's focus on tokens related to adsorbates, bulk composition, and their interacting atoms. Moreover, interacting atoms emerge as effective descriptors for adsorption configurations, while factors such as bond length and atomic properties of these atoms offer limited predictive contributions. By predicting adsorption energy from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Advanced Graph Neural Networks
