On Explaining Random Forests with SAT
Yacine Izza, Joao Marques-Silva

TL;DR
This paper proves that explaining Random Forests is computationally hard, introduces a SAT-based method for explanations, and demonstrates its efficiency and scalability over existing heuristics on real datasets.
Contribution
It establishes the computational complexity of explaining RFs and proposes a novel SAT-based approach that outperforms heuristics in practice.
Findings
SAT-based explanations scale well to large RFs
The approach outperforms existing heuristics on most datasets
Computing explanations of RFs is D^P-complete
Abstract
Random Forest (RFs) are among the most widely used Machine Learning (ML) classifiers. Even though RFs are not interpretable, there are no dedicated non-heuristic approaches for computing explanations of RFs. Moreover, there is recent work on polynomial algorithms for explaining ML models, including naive Bayes classifiers. Hence, one question is whether finding explanations of RFs can be solved in polynomial time. This paper answers this question negatively, by proving that computing one PI-explanation of an RF is D^P-complete. Furthermore, the paper proposes a propositional encoding for computing explanations of RFs, thus enabling finding PI-explanations with a SAT solver. This contrasts with earlier work on explaining boosted trees (BTs) and neural networks (NNs), which requires encodings based on SMT/MILP. Experimental results, obtained on a wide range of publicly available datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference · Machine Learning and Data Classification
