Beyond the Training Domain: Robust Generative Transition State Models for Unseen Chemistry
Samir Darouich, Jacob W. Toney, Weiliang Luo, Johannes K\"astner, Mathias Niepert, Heather J. Kulik

TL;DR
This paper develops robust generative models for transition state prediction that generalize beyond small organic molecules by using self-supervised pretraining, significantly improving accuracy on unseen chemical systems.
Contribution
It introduces a self-supervised pretraining strategy that enhances the generalization of generative transition state models to unseen elements and complex reactions.
Findings
Self-supervised pretraining reduces median RMSD from 0.39 to 0.19 Å.
Pretraining decreases data needs by up to 75%.
Models show improved robustness on new chemical environments.
Abstract
Transition states (TSs) govern the rates and outcomes of chemical reactions, making their accurate prediction a central challenge in computational chemistry. Although recent machine-learning models achieve near chemical accuracy in the prediction of TS structures and the associated reaction barriers for small organic reactions, their ability to generalize beyond the training domain remains largely unexplored. Here, we introduce targeted benchmarks to probe chemical and structural novelty in generative TS prediction. Building on Transition1x, a large-scale dataset of reactions involving small organic molecules, we construct curated extensions incorporating controlled elemental substitutions and diverse transition-metal complexes (TMC). These benchmarks reveal fundamental limitations of generative models in the generalization to previously unseen elements. As a result, they produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
