Learning Curves for Drug Response Prediction in Cancer Cell Lines

Alexander Partin (1; 2); Thomas Brettin (2; 3); Yvonne A. Evrard; (4); Yitan Zhu (1; 2); Hyunseung Yoo (1; 2); Fangfang Xia (1; 2),; Songhao Jiang (7); Austin Clyde (1; 7); Maulik Shukla (1; 2); Michael; Fonstein (5); James H. Doroshow (6); Rick Stevens (3; 7) ((1) Division of; Data Science; Learning; Argonne National Laboratory; Argonne; IL; USA; (2); University of Chicago Consortium for Advanced Science; Engineering,; University of Chicago; Chicago; IL; USA; (3) Computing; Environment; Life; Sciences; Argonne National Laboratory; Lemont; IL; USA; (4) Frederick; National Laboratory for Cancer Research; Leidos Biomedical Research; Inc.; Frederick; MD; USA; (5) Biosciences Division; Argonne National Laboratory,; Lemont; IL; USA; (6) Division of Cancer Therapeutics; Diagnosis; National; Cancer Institute; Bethesda; MD; USA; (7) Department of Computer Science; The; University of Chicago; Chicago; IL; USA)

arXiv:2011.12466·q-bio.QM·November 30, 2020

Learning Curves for Drug Response Prediction in Cancer Cell Lines

Alexander Partin (1, 2), Thomas Brettin (2, 3), Yvonne A. Evrard, (4), Yitan Zhu (1, 2), Hyunseung Yoo (1, 2), Fangfang Xia (1, 2),, Songhao Jiang (7), Austin Clyde (1, 7), Maulik Shukla (1, 2), Michael, Fonstein (5), James H. Doroshow (6), Rick Stevens (3, 7) ((1) Division of

PDF

1 Repo

TL;DR

This study uses empirical learning curves to evaluate how different machine learning models, including neural networks and gradient boosting decision trees, scale with increasing data for predicting drug response in cancer cell lines, revealing model-dataset interactions and guiding future data collection.

Contribution

It introduces a power law fitting framework for learning curves in drug response prediction, comparing neural networks and GBDT models across multiple datasets and training sizes.

Findings

01

Multi-input neural networks outperform single-input models at larger data sizes.

02

Gradient boosting decision trees perform better at smaller training sizes.

03

Increasing data size is likely to improve model prediction performance.

Abstract

Motivated by the size of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting drug response to advance cancer treatment. As drug sensitivity studies continue generating data, a common question is whether the proposed predictors can further improve the generalization performance with more training data. We utilize empirical learning curves for evaluating and comparing the data scaling properties of two neural networks (NNs) and two gradient boosting decision tree (GBDT) models trained on four drug screening datasets. The learning curves are accurately fitted to a power law model, providing a framework for assessing the data scaling behavior of these predictors. The curves demonstrate that no single model dominates in terms of prediction performance across all datasets and training sizes, suggesting that the shape of these curves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adpartin/dr-learning-curves
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.