How to Explain Neural Networks: an Approximation Perspective
Hangcheng Dong, Bingguo Liu, Fengdong Chen, Dong Ye, Guodong Liu

TL;DR
This paper introduces an approximation theory-based approach to interpret neural networks, proposing MLP as a universal interpreter for black-box models, with extensive experiments validating its effectiveness.
Contribution
It presents a novel approximation perspective on interpretability and introduces MLP as a universal tool for explaining various neural network models.
Findings
Effective explanation of neural networks using approximation theory
MLP successfully interprets diverse black-box models
Experimental results demonstrate the approach's robustness
Abstract
The lack of interpretability has hindered the large-scale adoption of AI technologies. However, the fundamental idea of interpretability, as well as how to put it into practice, remains unclear. We provide notions of interpretability based on approximation theory in this study. We first implement this approximation interpretation on a specific model (fully connected neural network) and then propose to use MLP as a universal interpreter to explain arbitrary black-box models. Extensive experiments demonstrate the effectiveness of our approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
