GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets
Johannes Gasteiger, Muhammed Shuaibi, Anuroop Sriram, Stephan, G\"unnemann, Zachary Ulissi, C. Lawrence Zitnick, Abhishek Das

TL;DR
This paper introduces GemNet-OC, a graph neural network optimized for large, diverse molecular datasets, demonstrating significant performance improvements and efficiency, and highlighting the importance of representative datasets for generalizable GNN development.
Contribution
The paper develops GemNet-OC, a GNN tailored for large-scale molecular datasets, and systematically analyzes dataset complexity impacts on model performance and development practices.
Findings
GemNet-OC outperforms previous models on OC20 by 16%.
Training time is reduced by a factor of 10 with GemNet-OC.
Results on OC-2M dataset correlate well with full OC20 performance.
Abstract
Recent years have seen the advent of molecular simulation datasets that are orders of magnitude larger and more diverse. These new datasets differ substantially in four aspects of complexity: 1. Chemical diversity (number of different elements), 2. system size (number of atoms per sample), 3. dataset size (number of data samples), and 4. domain shift (similarity of the training and test set). Despite these large differences, benchmarks on small and narrow datasets remain the predominant method of demonstrating progress in graph neural networks (GNNs) for molecular simulation, likely due to cheaper training compute requirements. This raises the question -- does GNN progress on small and narrow datasets translate to these more complex datasets? This work investigates this question by first developing the GemNet-OC model based on the large Open Catalyst 2020 (OC20) dataset. GemNet-OC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Topic Modeling · Advanced Graph Neural Networks
