The TCGA Meta-Dataset Clinical Benchmark
Mandana Samiei, Tobias W\"urfl, Tristan Deleu, Martin Weiss, Francis, Dutil, Thomas Fevens, Genevi\`eve Boucher, Sebastien Lemieux, Joseph Paul, Cohen

TL;DR
This paper introduces a comprehensive TCGA Meta-Dataset with 174 tasks for multi-outcome clinical prediction using gene expression data, aiming to standardize benchmarks and facilitate development of methods for small sample sizes.
Contribution
It provides a large, unified benchmark dataset from TCGA for multi-task clinical prediction, addressing inconsistencies and enabling research on few-sample gene expression analysis.
Findings
Neural networks outperform regression on multiple tasks.
150 samples suffice for baseline model training.
Diverse clinical variables can be predicted from gene expression.
Abstract
Machine learning is bringing a paradigm shift to healthcare by changing the process of disease diagnosis and prognosis in clinics and hospitals. This development equips doctors and medical staff with tools to evaluate their hypotheses and hence make more precise decisions. Although most current research in the literature seeks to develop techniques and methods for predicting one particular clinical outcome, this approach is far from the reality of clinical decision making in which you have to consider several factors simultaneously. In addition, it is difficult to follow the recent progress concretely as there is a lack of consistency in benchmark datasets and task definitions in the field of Genomics. To address the aforementioned issues, we provide a clinical Meta-Dataset derived from the publicly available data hub called The Cancer Genome Atlas Program (TCGA) that contains 174…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Frailty in Older Adults · Meta-analysis and systematic reviews
