Large scale statistical analysis of GEO datasets
Bernard Ycart (LJK), Konstantina Charmpi (LJK), Sophie Rousseaux,, Jean-Jacques Fourni\'e (CRCT)

TL;DR
This paper performs a large-scale statistical analysis of 20 GEO gene expression datasets, demonstrating that meaningful biological insights can be obtained by combining data from different sources despite their variability.
Contribution
It introduces a robust statistical approach for integrating multiple gene expression datasets, highlighting the potential for extracting biological information from merged sources.
Findings
Significant biological signals can be identified across merged datasets.
Differences between datasets are comparable to within-dataset variability.
Merging datasets enhances the power to detect biological patterns.
Abstract
The problem addressed here is that of simultaneous treatment of several gene expression datasets, possibly collected under different experimental conditions and/or platforms. Using robust statistics, a large scale statistical analysis has been conducted over datasets downloaded from the Gene Expression Omnibus repository. The differences between datasets are compared to the variability inside a given dataset. Evidence that meaningful biological information can be extracted by merging different sources is provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Evolutionary Algorithms and Applications · Machine Learning and Data Classification
