The Open Catalyst 2020 (OC20) Dataset and Community Challenges
Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril,, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb, Ho, Weihua Hu, Aini Palizhati, Anuroop Sriram, Brandon Wood, Junwoong Yoon,, Devi Parikh, C. Lawrence Zitnick, Zachary Ulissi

TL;DR
The paper introduces the OC20 dataset with over 1.2 million DFT relaxations for catalyst modeling, along with baseline models and community challenges to advance machine learning in catalysis discovery.
Contribution
It provides a large, diverse dataset and baseline models for catalyst discovery, enabling generalization across compositions and configurations, and fostering community-driven progress.
Findings
Baseline models show room for improvement with larger models.
The dataset covers a wide range of materials and adsorbates.
Open resources and leaderboards encourage community contributions.
Abstract
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than related fields. To address this we developed the OC20 dataset, consisting of 1,281,040 Density Functional Theory (DFT) relaxations (~264,890,000 single point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsGraph Neural Network · Shifted Softplus · Schrödinger Network
