Heterogenous Ensemble of Models for Molecular Property Prediction

Sajad Darabi; Shayan Fazeli; Jiwei Liu; Alexandre Milesi; Pawel; Morkisz; Jean-Fran\c{c}ois Puget; Gilberto Titericz

arXiv:2211.11035·cs.LG·November 22, 2022

Heterogenous Ensemble of Models for Molecular Property Prediction

Sajad Darabi, Shayan Fazeli, Jiwei Liu, Alexandre Milesi, Pawel, Morkisz, Jean-Fran\c{c}ois Puget, Gilberto Titericz

PDF

Open Access 1 Repo

TL;DR

This paper presents a heterogenous ensemble approach combining Transformer, GNN, and ResNet models trained on multiple molecular data modalities, achieving state-of-the-art results in a large-scale molecular property prediction challenge.

Contribution

It introduces a novel ensemble method integrating diverse model architectures and data modalities, winning the OGB Large-Scale Challenge 2022.

Findings

01

Achieved a test MAE of 0.0723 on the PCQM4Mv2 dataset.

02

Ensembling multiple modalities improves prediction accuracy.

03

Inference time is under 2 hours for the entire solution.

Abstract

Previous works have demonstrated the importance of considering different modalities on molecules, each of which provide a varied granularity of information for downstream property prediction tasks. Our method combines variants of the recent TransformerM architecture with Transformer, GNN, and ResNet backbone architectures. Models are trained on the 2D data, 3D data, and image modalities of molecular graphs. We ensemble these models with a HuberRegressor. The models are trained on 4 different train/validation splits of the original train + valid datasets. This yields a winning solution to the 2\textsuperscript{nd} edition of the OGB Large-Scale Challenge (2022) on the PCQM4Mv2 molecular property prediction dataset. Our proposed method achieves a test-challenge MAE of $0.0723$ and a validation MAE of $0.07145$ . Total inference time for our solution is less than 2 hours. We open-source our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jfpuget/nvidia-pcqm4mv2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics

MethodsMulti-Head Attention · Attention Is All You Need · Masked autoencoder · Softmax · Layer Normalization · Batch Normalization · Adam · Linear Layer · Dense Connections · Residual Connection