Forest-Guided Clustering -- Shedding Light into the Random Forest Black Box
Lisa Barros de Andrade e Sousa, Gregor Miller, Ronan Le Gleut, Dominik Thalmeier, Helena Pelin, Marie Piraud

TL;DR
Forest-Guided Clustering (FGC) is a novel explainability method for Random Forests that reveals internal structure and provides interpretable clusters, improving understanding of model decisions and uncovering meaningful subpopulations in biological data.
Contribution
FGC introduces a model-specific clustering approach that uncovers both local and global structure in RFs, enhancing interpretability and biological insight beyond traditional feature attribution methods.
Findings
FGC accurately recovered latent subclass structures in benchmark datasets.
FGC uncovered biologically coherent subpopulations in transcriptomic data.
FGC outperformed classical clustering and explanation methods.
Abstract
As machine learning models are increasingly deployed in sensitive application areas, the demand for interpretable and trustworthy decision-making has increased. Random Forests (RF), despite their widespread use and strong performance on tabular data, remain difficult to interpret due to their ensemble nature. We present Forest-Guided Clustering (FGC), a model-specific explainability method that reveals both local and global structure in RFs by grouping instances according to shared decision paths. FGC produces human-interpretable clusters aligned with the model's internal logic and computes cluster-specific and global feature importance scores to derive decision rules underlying RF predictions. FGC accurately recovered latent subclass structure on a benchmark dataset and outperformed classical clustering and post-hoc explanation methods. Applied to an AML transcriptomic dataset, FGC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Gene expression and cancer classification · Single-cell and spatial transcriptomics
