# Cost-complexity pruning of random forests

**Authors:** Kiran Bangalore Ravi, Jean Serra

arXiv: 1703.05430 · 2017-07-20

## TL;DR

This paper explores using out-of-bag samples for post-pruning decision trees within random forests, aiming to reduce model complexity while maintaining accuracy, based on empirical results from UCI datasets.

## Contribution

It introduces a novel approach to improve random forest generalization by applying cost-complexity pruning using out-of-bag samples.

## Key findings

- Reduced forest size without significant accuracy loss
- Consistent improvement across multiple datasets
- Effective post-pruning method for random forests

## Abstract

Random forests perform bootstrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first of the decision trees and second the random forest by post-pruning. A preliminary empirical study on four UCI repository datasets show consistent decrease in the size of the forests without considerable loss in accuracy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.05430/full.md

## Figures

22 figures with captions in the complete paper: https://tomesphere.com/paper/1703.05430/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1703.05430/full.md

---
Source: https://tomesphere.com/paper/1703.05430