# Improved high-dimensional prediction with Random Forests by the use of   co-data

**Authors:** Dennis E. te Beest, Steven W. Mes, Ruud H. Brakenhoff, Mark A. van de, Wiel

arXiv: 1706.00641 · 2017-06-05

## TL;DR

This paper introduces a method to enhance high-dimensional predictions using Random Forests by incorporating auxiliary co-data, which improves variable selection and prediction accuracy without using response labels.

## Contribution

The paper proposes a novel co-data moderated Random Forest (CoRF) that uses external information to improve predictions in high-dimensional settings.

## Key findings

- CoRF outperforms standard Random Forests in predictive accuracy.
- External co-data like p-values and gene signatures improve model performance.
- The method is demonstrated on gene expression data for lymph node metastasis prediction.

## Abstract

Prediction in high dimensional settings is difficult due to large by number of variables relative to the sample size. We demonstrate how auxiliary "co-data" can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities (used to draw candidate variables, the default for a Random Forest) by co-data moderated sampling probabilities. Co-data here is defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate this co-data moderated Random Forest (CoRF) with one example. In the example we aim to predict a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.00641/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1706.00641/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1706.00641/full.md

---
Source: https://tomesphere.com/paper/1706.00641