On Improving Graph Neural Networks for QSAR by Pre-training on Extended-Connectivity Fingerprints
Sam Money-Kyrle, Markus Dablander, Thierry Hanser, Stephane Werner, Charlotte M. Deane, Garrett M. Morris

TL;DR
Pre-training Graph Neural Networks on Extended-Connectivity Fingerprints improves QSAR prediction performance in out-of-distribution scenarios, especially on benchmark datasets, but has limitations with complex endpoints.
Contribution
Proposes a pre-training strategy using ECFPs for GNNs in QSAR, demonstrating significant improvements on multiple benchmarks and analyzing data leakage effects.
Findings
Significant performance gains on five of six benchmarks with ECFP pre-training.
Pre-trained GNNs underperform on complex, heterogeneous datasets in OOD settings.
Data leakage at substructure level impacts the effectiveness of pre-training.
Abstract
Molecular Graph Neural Networks (GNNs) are increasingly common in drug discovery, particularly for Quantitative Structure-Activity Relationship (QSAR) studies; yet, their superiority compared to classical molecular featurisation approaches is disputed. We report a general strategy for improving GNNs for QSAR by pre-training to predict Extended-Connectivity Fingerprints (ECFP). We validate our approach with statistical tests and challenging out-of-distribution (OOD) splits. Across five out of six Biogen benchmarks, we observed a statistically significant improvement in standard performance metrics over all evaluated baselines when using ECFP pre-trained GNNs. However, for more heterogeneous datasets and more complex endpoints, such as binding affinity prediction, pre-trained GNNs underperformed in OOD settings. Importantly, we investigated the impact of substructure-level data leakage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
