TL;DR
This paper introduces a new two-sample testing method based on Random Forest classifiers, which is easy to implement, requires minimal tuning, and provides insights into variable importance, with proven asymptotic power and real-world applications.
Contribution
It develops a novel Random Forest-based two-sample test with asymptotic power analysis and practical implementation via the hypoRF R-package.
Findings
The proposed test is easy to use and tune.
It is applicable to any distribution on .
Real-world applications demonstrate its effectiveness.
Abstract
Following the line of classification-based two-sample testing, tests based on the Random Forest classifier are proposed. The developed tests are easy to use, require almost no tuning, and are applicable for any distribution on . Furthermore, the built-in variable importance measure of the Random Forest gives potential insights into which variables make out the difference in distribution. An asymptotic power analysis for the proposed tests is developed. Finally, two real-world applications illustrate the usefulness of the introduced methodology. To simplify the use of the method, the R-package "hypoRF" is provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
