# On the Use of Random Forest for Two-Sample Testing

**Authors:** Simon Hediger, Loris Michel, Jeffrey N\"af

arXiv: 1903.06287 · 2021-05-07

## TL;DR

This paper introduces a new two-sample testing method based on Random Forest classifiers, which is easy to implement, requires minimal tuning, and provides insights into variable importance, with proven asymptotic power and real-world applications.

## Contribution

It develops a novel Random Forest-based two-sample test with asymptotic power analysis and practical implementation via the hypoRF R-package.

## Key findings

- The proposed test is easy to use and tune.
- It is applicable to any distribution on .
- Real-world applications demonstrate its effectiveness.

## Abstract

Following the line of classification-based two-sample testing, tests based on the Random Forest classifier are proposed. The developed tests are easy to use, require almost no tuning, and are applicable for any distribution on $\mathbb{R}^d$. Furthermore, the built-in variable importance measure of the Random Forest gives potential insights into which variables make out the difference in distribution. An asymptotic power analysis for the proposed tests is developed. Finally, two real-world applications illustrate the usefulness of the introduced methodology. To simplify the use of the method, the R-package "hypoRF" is provided.

---
Source: https://tomesphere.com/paper/1903.06287