Integrating White and Black Box Techniques for Interpretable Machine Learning
Eric M. Vernon, Naoki Masuyama, and Yusuke Nojima

TL;DR
This paper proposes an ensemble approach that combines interpretable white box models for simple inputs with complex black box models for difficult inputs to balance interpretability and accuracy in machine learning.
Contribution
It introduces a novel ensemble classifier that adaptively uses white and black box models based on input difficulty, enhancing interpretability without sacrificing performance.
Findings
Improved interpretability for easy inputs
Maintained high accuracy on complex inputs
Demonstrated effectiveness on benchmark datasets
Abstract
In machine learning algorithm design, there exists a trade-off between the interpretability and performance of the algorithm. In general, algorithms which are simpler and easier for humans to comprehend tend to show worse performance than more complex, less transparent algorithms. For example, a random forest classifier is likely to be more accurate than a simple decision tree, but at the expense of interpretability. In this paper, we present an ensemble classifier design which classifies easier inputs using a highly-interpretable classifier (i.e., white box model), and more difficult inputs using a more powerful, but less interpretable classifier (i.e., black box model).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
