Cluster-Based Random Forest Visualization and Interpretation
Max Sondag, Christofer Meinecke, Dennis Collaris, Tatiana von Landesberger, Stef van den Elzen

TL;DR
This paper introduces a novel visualization system for random forests that clusters similar trees using a new distance metric, enhancing interpretability without analyzing each tree individually.
Contribution
The paper proposes a new clustering method for decision trees in random forests and two visualization techniques to improve model interpretability.
Findings
Effective clustering of decision trees based on rules and predictions
Visualization methods reveal feature importance and decision rules
Case study demonstrates improved interpretability
Abstract
Random forests are a machine learning method used to automatically classify datasets and consist of a multitude of decision trees. While these random forests often have higher performance and generalize better than a single decision tree, they are also harder to interpret. This paper presents a visualization method and system to increase interpretability of random forests. We cluster similar trees which enables users to interpret how the model performs in general without needing to analyze each individual decision tree in detail, or interpret an oversimplified summary of the full forest. To meaningfully cluster the decision trees, we introduce a new distance metric that takes into account both the decision rules as well as the predictions of a pair of decision trees. We also propose two new visualization methods that visualize both clustered and individual decision trees: (1) The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
