Interpreting Deep Learning Model Using Rule-based Method
Xiaojian Wang, Jingyuan Wang, Ke Tang

TL;DR
This paper introduces a multi-level decision framework that interprets deep neural networks by approximating them with decision trees, enabling both local and global explanations with high fidelity and interpretability.
Contribution
It proposes a novel multi-level decision structure (MLD) that approximates neural networks and offers algorithms for local and global interpretability, enhancing understanding of complex models.
Findings
High fidelity approximation of neural networks using MLD
Effective local explanations via decision generation and rule induction
Global feature importance extraction methods demonstrated on datasets
Abstract
Deep learning models are favored in many research and industry areas and have reached the accuracy of approximating or even surpassing human level. However they've long been considered by researchers as black-box models for their complicated nonlinear property. In this paper, we propose a multi-level decision framework to provide comprehensive interpretation for the deep neural network model. In this multi-level decision framework, by fitting decision trees for each neuron and aggregate them together, a multi-level decision structure (MLD) is constructed at first, which can approximate the performance of the target neural network model with high efficiency and high fidelity. In terms of local explanation for sample, two algorithms are proposed based on MLD structure: forward decision generation algorithm for providing sample decisions, and backward rule induction algorithm for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsInterpretability
