Are Thousands of Samples Really Needed to Generate Robust Gene-List for   Prediction of Cancer Outcome?

Royi Jacobovic

arXiv:1701.03159·stat.AP·October 17, 2017·1 cites

Are Thousands of Samples Really Needed to Generate Robust Gene-List for Prediction of Cancer Outcome?

Royi Jacobovic

PDF

Open Access

TL;DR

This paper questions the necessity of thousands of samples for robust gene list prediction of cancer outcomes, highlighting potential overestimations due to model assumption violations and empirical Bayes limitations.

Contribution

It challenges prior conclusions by demonstrating that key statistical assumptions are inconsistent with sparsity and Gaussianity, and that empirical Bayes methods may overestimate sample size needs.

Findings

01

Model assumptions are inconsistent with sparsity and Gaussianity.

02

Empirical Bayes methods fail to detect severe assumption violations.

03

Overestimation of required sample size may occur due to these issues.

Abstract

The prediction of cancer prognosis and metastatic potential immediately after the initial diagnoses is a major challenge in current clinical research. The relevance of such a signature is clear, as it will free many patients from the agony and toxic side-effects associated with the adjuvant chemotherapy automatically and sometimes carelessly subscribed to them. Motivated by this issue, Ein-Dor (2006) and Zuk (2007) presented a Bayesian model which leads to the following conclusion: Thousands of samples are needed to generate a robust gene list for predicting outcome. This conclusion is based on existence of some statistical assumptions. The current work raises doubts over this determination by showing that: (1) These assumptions are not consistent with additional assumptions such as sparsity and Gaussianity. (2) The empirical Bayes methodology which was suggested in order to test the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Biomedical Text Mining and Ontologies