Statistical Depth based Normalization and Outlier Detection of Gene Expression Data
Alicia Nieto-Reyes, Javier Cabrera

TL;DR
This paper introduces a statistical depth-based normalization method for gene expression data that preserves properties of the median and proposes analytical outlier detection techniques for genes and samples, addressing high-dimensional challenges.
Contribution
It presents a novel normalization procedure using statistical data depth and analytical outlier detection methods tailored for high-dimensional gene expression data.
Findings
Effective normalization preserving median properties
Analytical outlier detection for genes and samples
Application to four gene expression datasets
Abstract
Normalization and outlier detection belong to the preprocessing of gene expression data. We propose a natural normalization procedure based on statistical data depth which normalizes to the distribution of gene expressions of the most representative gene expression of the group. This differ from the standard method of quantile normalization, based on the coordinate-wise median array that lacks of the well-known properties of the one-dimensional median. The statistical data depth maintains those good properties. Gene expression data are known for containing outliers. Although detecting outlier genes in a given gene expression dataset has been broadly studied, these methodologies do not apply for detecting outlier samples, given the difficulties posed by the high dimensionality but low sample size structure of the data. The standard procedures used for detecting outlier samples are visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Fault Detection and Control Systems
