# Predicting phenotypes from microarrays using amplified, initially   marginal, eigenvector regression

**Authors:** Lei Ding, Daniel J. McDonald

arXiv: 1907.05927 · 2019-07-16

## TL;DR

This paper introduces a new gene expression analysis method that leverages marginal relationships and eigenvector regression to improve phenotype prediction, gene selection, and uncover gene interactions, especially for small sample sizes.

## Contribution

The authors develop a novel amplified eigenvector regression technique that enhances phenotype prediction and gene selection from microarray data, outperforming existing methods.

## Key findings

- Method is computationally efficient and scalable.
- Outperforms other approaches on multiple datasets.
- Identifies novel gene candidates for further study.

## Abstract

Motivation: The discovery of relationships between gene expression measurements and phenotypic responses is hampered by both computational and statistical impediments. Conventional statistical methods are less than ideal because they either fail to select relevant genes, predict poorly, ignore the unknown interaction structure between genes, or are computationally intractable. Thus, the creation of new methods which can handle many expression measurements on relatively small numbers of patients while also uncovering gene-gene relationships and predicting well is desirable.   Results: We develop a new technique for using the marginal relationship between gene expression measurements and patient survival outcomes to identify a small subset of genes which appear highly relevant for predicting survival, produce a low-dimensional embedding based on this small subset, and amplify this embedding with information from the remaining genes. We motivate our methodology by using gene expression measurements to predict survival time for patients with diffuse large B-cell lymphoma, illustrate the behavior of our methodology on carefully constructed synthetic examples, and test it on a number of other gene expression datasets. Our technique is computationally tractable, generally outperforms other methods, is extensible to other phenotypes, and also identifies different genes (relative to existing methods) for possible future study.   Key words: regression; principal components; matrix sketching; preconditioning   Availability: All of the code and data are available at https://github.com/dajmcdon/aimer/.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.05927/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1907.05927/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/1907.05927/full.md

---
Source: https://tomesphere.com/paper/1907.05927