Large scale statistical analysis of GEO datasets

Bernard Ycart (LJK); Konstantina Charmpi (LJK); Sophie Rousseaux,; Jean-Jacques Fourni\'e (CRCT)

arXiv:1410.2585·stat.ME·October 10, 2014·1 cites

Large scale statistical analysis of GEO datasets

Bernard Ycart (LJK), Konstantina Charmpi (LJK), Sophie Rousseaux,, Jean-Jacques Fourni\'e (CRCT)

PDF

Open Access

TL;DR

This paper performs a large-scale statistical analysis of 20 GEO gene expression datasets, demonstrating that meaningful biological insights can be obtained by combining data from different sources despite their variability.

Contribution

It introduces a robust statistical approach for integrating multiple gene expression datasets, highlighting the potential for extracting biological information from merged sources.

Findings

01

Significant biological signals can be identified across merged datasets.

02

Differences between datasets are comparable to within-dataset variability.

03

Merging datasets enhances the power to detect biological patterns.

Abstract

The problem addressed here is that of simultaneous treatment of several gene expression datasets, possibly collected under different experimental conditions and/or platforms. Using robust statistics, a large scale statistical analysis has been conducted over $20$ datasets downloaded from the Gene Expression Omnibus repository. The differences between datasets are compared to the variability inside a given dataset. Evidence that meaningful biological information can be extracted by merging different sources is provided.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Evolutionary Algorithms and Applications · Machine Learning and Data Classification