Training distribution determines the ceiling of drug-blind cancer sensitivity prediction
Taekyung Heo

TL;DR
This study reveals that the perceived stagnation in drug-blind cancer sensitivity prediction is due to metric artifacts and shows that stratified training and response matching can significantly improve predictive performance.
Contribution
The paper demonstrates that standard metrics obscure true predictive capabilities and introduces strategies to recover meaningful predictive signals in drug-blind cancer sensitivity prediction.
Findings
Global Pearson r is dominated by between-drug differences, not cell-specific signals.
Per-drug Pearson r reveals no benefit from drug encoding over cell features alone.
Mechanism-based stratification improves per-drug prediction accuracy for targeted kinase inhibitors.
Abstract
Precision oncology requires predicting which drugs will suppress a specific tumor from its molecular profile, but drug-blind sensitivity prediction has plateaued despite increasingly complex drug representations. Here we show that this stagnation reflects a metric artifact rather than a representational bottleneck. The standard benchmark, global Pearson r, is dominated by between-drug potency differences that a trivial drug-mean predictor captures without any cell-specific learning. Per-drug Pearson r, which isolates within-drug cell ranking, reveals that no drug encoding improves over cell-only features across four independent datasets. A controlled experiment channeling mechanism-of-action identity as either a drug feature or a training-distribution constraint identifies the cause. Supplying MoA as a feature yields negligible benefit, whereas using it to stratify training raises…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
