To tune or not to tune the number of trees in random forest?
Philipp Probst, Anne-Laure Boulesteix

TL;DR
This paper investigates how the number of trees in a random forest affects performance, revealing that error rates can be non-monotonous and suggesting setting the number based on convergence rather than maximum size.
Contribution
It provides theoretical insights into the non-monotonous behavior of error rates with respect to the number of trees and offers practical guidelines for choosing this number.
Findings
Expected error rate may decrease then increase with more trees.
Other performance measures like Brier score do not show non-monotonous patterns.
Large-scale dataset analysis supports setting a large, computationally feasible number of trees.
Abstract
The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Mining Algorithms and Applications · Machine Learning and Algorithms
