ranger: A Fast Implementation of Random Forests for High Dimensional   Data in C++ and R

Marvin N. Wright; Andreas Ziegler

arXiv:1508.04409·stat.ML·May 18, 2018

ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R

Marvin N. Wright, Andreas Ziegler

PDF

2 Repos

TL;DR

Ranger is a highly efficient C++ and R implementation of random forests optimized for high-dimensional data, offering superior speed and memory efficiency compared to existing solutions, especially suitable for large-scale genomic studies.

Contribution

The paper introduces ranger, a new software that significantly improves the speed and memory efficiency of random forest algorithms for high-dimensional data analysis.

Findings

01

Ranger outperforms other implementations in runtime.

02

Ranger uses less memory than competitors.

03

Ranger scales well with increasing data size.

Abstract

We introduce the C++ application and R package ranger. The software is a fast implementation of random forests for high dimensional data. Ensembles of classification, regression and survival trees are supported. We describe the implementation, provide examples, validate the package with a reference implementation, and compare runtime and memory usage with other implementations. The new software proves to scale best with the number of features, samples, trees, and features tried for splitting. Finally, we show that ranger is the fastest and most memory efficient implementation of random forests to analyze data on the scale of a genome-wide association study.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.