On Graph Neural Network Ensembles for Large-Scale Molecular Property Prediction
Edward Elson Kosasih, Joaquin Cabezas, Xavier Sumba, Piotr Bielak,, Kamil Tagowski, Kelvin Idanwekhai, Benedict Aaron Tjandra, Arian Rokkum, Jamasb

TL;DR
This paper presents an ensemble of three graph neural network models for large-scale molecular property prediction, achieving significant improvements over baselines and enabling uncertainty-based identification of challenging molecules.
Contribution
It introduces a novel ensemble approach combining GIN, Bayesian Neural Networks, and DiffPool for large-scale molecular property prediction.
Findings
Ensemble outperforms baseline by 7.6%
Uncertainty helps identify molecules with harder-to-predict properties
Achieves Pearson's correlation of 0.5181 for challenging molecules
Abstract
In order to advance large-scale graph machine learning, the Open Graph Benchmark Large Scale Challenge (OGB-LSC) was proposed at the KDD Cup 2021. The PCQM4M-LSC dataset defines a molecular HOMO-LUMO property prediction task on about 3.8M graphs. In this short paper, we show our current work-in-progress solution which builds an ensemble of three graph neural networks models based on GIN, Bayesian Neural Networks and DiffPool. Our approach outperforms the provided baseline by 7.6%. Moreover, using uncertainty in our ensemble's prediction, we can identify molecules whose HOMO-LUMO gaps are harder to predict (with Pearson's correlation of 0.5181). We anticipate that this will facilitate active learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Advanced Graph Neural Networks
MethodsDiffPool · Graph Isomorphism Network
