Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology
Keith A. Baggerly, Kevin R. Coombes

TL;DR
This paper highlights the importance of reproducible data processing in high-throughput biology, revealing common errors in microarray-based drug sensitivity studies that could impact patient treatment decisions.
Contribution
It demonstrates how simple errors in data analysis can lead to incorrect conclusions in microarray studies and proposes steps to improve research reproducibility.
Findings
Identification of common data processing errors in published studies
Evidence that errors can influence clinical trial outcomes
Call for better documentation and validation practices
Abstract
High-throughput biological assays such as microarrays let us ask very detailed questions about how diseases operate, and promise to let us personalize therapy. Data processing, however, is often not described well enough to allow for exact reproduction of the results, leading to exercises in "forensic bioinformatics" where aspects of raw data and reported results are used to infer what methods must have been employed. Unfortunately, poor documentation can shift from an inconvenience to an active danger when it obscures not just methods but errors. In this report we examine several related papers purporting to use microarray-based signatures of drug sensitivity derived from cell lines to predict patient response. Patients in clinical trials are currently being allocated to treatment arms on the basis of these results. However, we show in five case studies that the results incorporate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
