Improving generalisability of 3D binding affinity models in low data regimes
Julia Buhmann, Ward Haddadin, Luk\'a\v{s} Pravda, Alan Bilsland, Hagen, Triendl

TL;DR
This paper investigates the generalisability of 3D binding affinity models in low data settings, proposing new dataset splits and training strategies to improve model performance, especially for GNN architectures.
Contribution
Introduces a novel dataset split minimizing similarity leakage and three pre-training techniques to enhance GNN performance in low data regimes.
Findings
3D global models outperform local models in low data regimes
Supervised pre-training with quantum data improves GNN accuracy
Explicit hydrogen modeling benefits GNN performance
Abstract
Predicting protein-ligand binding affinity is an essential part of computer-aided drug design. However, generalisable and performant global binding affinity models remain elusive, particularly in low data regimes. Despite the evolution of model architectures, current benchmarks are not well-suited to probe the generalisability of 3D binding affinity models. Furthermore, 3D global architectures such as GNNs have not lived up to performance expectations. To investigate these issues, we introduce a novel split of the PDBBind dataset, minimizing similarity leakage between train and test sets and allowing for a fair and direct comparison between various model architectures. On this low similarity split, we demonstrate that, in general, 3D global models are superior to protein-specific local models in low data regimes. We also demonstrate that the performance of GNNs benefits from three novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Bioinformatics and Genomic Networks · Glycosylation and Glycoproteins Research
