# A network flow approach to visualising the roles of covariates in random   forests

**Authors:** Benjamin R. Fitzpatrick, Kerrie Mengersen

arXiv: 1706.08702 · 2017-06-28

## TL;DR

This paper introduces novel visualization techniques, including parallel coordinates plots and Sankey diagrams, to better understand covariate interactions and hierarchies in random forests, enhancing interpretability.

## Contribution

The paper presents new visualization methods that effectively depict covariate hierarchies and interactions in random forests, addressing limitations of existing visual summaries.

## Key findings

- Visualizations reveal covariate effect hierarchies and interactions.
- Software implementation available as an R package.
- Enhanced interpretability of random forest models.

## Abstract

We propose novel applications of parallel coordinates plots and Sankey diagrams to represent the hierarchies of interacting covariate effects in random forests. Each visualisation summarises the frequencies of all of the paths through all of the trees in a random forest. Visualisations of the roles of covariates in random forests include: ranked bar or dot charts depicting scalar metrics of the contributions of individual covariates to the predictive accuracy of the random forest; line graphs depicting various summaries of the effect of varying a particular covariate on the predictions from the random forest; heatmaps of metrics of the strengths of interactions between all pairs of covariates; and parallel coordinates plots for each response class depicting the distributions of the values of all covariates among the observations most representative of those predicted to belong that class. Together these visualisations facilitate substantial insights into the roles of covariates in a random forest but do not communicate the frequencies of the hierarchies of covariates effects across the random forest or the orders in which covariates occur in these hierarchies. Our visualisations address these gaps. We demonstrate our visualisations using a random forest fitted to publicly available data and provide a software implementation in the form of an R package.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.08702/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1706.08702/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1706.08702/full.md

---
Source: https://tomesphere.com/paper/1706.08702