Training distribution determines the ceiling of drug-blind cancer sensitivity prediction

Taekyung Heo

arXiv:2605.20885·cs.LG·May 21, 2026

Training distribution determines the ceiling of drug-blind cancer sensitivity prediction

Taekyung Heo

PDF

TL;DR

This study reveals that the perceived stagnation in drug-blind cancer sensitivity prediction is due to metric artifacts and shows that stratified training and response matching can significantly improve predictive performance.

Contribution

The paper demonstrates that standard metrics obscure true predictive capabilities and introduces strategies to recover meaningful predictive signals in drug-blind cancer sensitivity prediction.

Findings

01

Global Pearson r is dominated by between-drug differences, not cell-specific signals.

02

Per-drug Pearson r reveals no benefit from drug encoding over cell features alone.

03

Mechanism-based stratification improves per-drug prediction accuracy for targeted kinase inhibitors.

Abstract

Precision oncology requires predicting which drugs will suppress a specific tumor from its molecular profile, but drug-blind sensitivity prediction has plateaued despite increasingly complex drug representations. Here we show that this stagnation reflects a metric artifact rather than a representational bottleneck. The standard benchmark, global Pearson r, is dominated by between-drug potency differences that a trivial drug-mean predictor captures without any cell-specific learning. Per-drug Pearson r, which isolates within-drug cell ranking, reveals that no drug encoding improves over cell-only features across four independent datasets. A controlled experiment channeling mechanism-of-action identity as either a drug feature or a training-distribution constraint identifies the cause. Supplying MoA as a feature yields negligible benefit, whereas using it to stratify training raises…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.