Learning Curves for Drug Response Prediction in Cancer Cell Lines
Alexander Partin (1, 2), Thomas Brettin (2, 3), Yvonne A. Evrard, (4), Yitan Zhu (1, 2), Hyunseung Yoo (1, 2), Fangfang Xia (1, 2),, Songhao Jiang (7), Austin Clyde (1, 7), Maulik Shukla (1, 2), Michael, Fonstein (5), James H. Doroshow (6), Rick Stevens (3, 7) ((1) Division of

TL;DR
This study uses empirical learning curves to evaluate how different machine learning models, including neural networks and gradient boosting decision trees, scale with increasing data for predicting drug response in cancer cell lines, revealing model-dataset interactions and guiding future data collection.
Contribution
It introduces a power law fitting framework for learning curves in drug response prediction, comparing neural networks and GBDT models across multiple datasets and training sizes.
Findings
Multi-input neural networks outperform single-input models at larger data sizes.
Gradient boosting decision trees perform better at smaller training sizes.
Increasing data size is likely to improve model prediction performance.
Abstract
Motivated by the size of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating data, a common question is whether the proposed predictors can further improve the generalization performance with more training data. We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these predictors. The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, suggesting that the shape of these curves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
