Decomposing the Generalization Gap in PROTAC Activity Prediction: Variance Attribution and the Inter-Laboratory Ceiling
Thor Klamt, Wolfgang Nejdl, Ming Tang

TL;DR
This paper decomposes the large generalization gap in PROTAC activity prediction, identifying inter-laboratory measurement variance as the main factor and proposing methods to mitigate it.
Contribution
It introduces a variance-decomposition framework for understanding generalization gaps and demonstrates how measurement variance limits predictive performance in PROTAC activity models.
Findings
Inter-laboratory measurement variance dominates the generalization gap.
Hyperparameter tuning cannot surpass the performance ceiling set by measurement variance.
Few-shot learning and calibration improve target-specific AUROC scores.
Abstract
Machine-learning predictors of biochemical activity often exhibit large random-split-to-leave-one-target-out generalisation gaps that have been documented but not decomposed. We frame this as an evaluation-science question and use targeted protein degradation as the empirical test bed. PROTACs (proteolysis-targeting chimeras) are heterobifunctional small molecules that induce targeted protein degradation, with more than forty candidates currently in clinical trials; published predictors report AUROC of 0.85 to 0.91 under random-split cross-validation, while the leave-one-target-out (LOTO) protocol of Ribes et al. reduces performance to approximately 0.67. Random splits reward within-target interpolation, whereas LOTO measures the novel-target prediction that de-novo design depends on. We decompose this gap and identify inter-laboratory measurement variance as the dominant component,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
