How to Explain Neural Networks: an Approximation Perspective

Hangcheng Dong; Bingguo Liu; Fengdong Chen; Dong Ye; Guodong Liu

arXiv:2105.07831·cs.LG·November 18, 2021

How to Explain Neural Networks: an Approximation Perspective

Hangcheng Dong, Bingguo Liu, Fengdong Chen, Dong Ye, Guodong Liu

PDF

Open Access

TL;DR

This paper introduces an approximation theory-based approach to interpret neural networks, proposing MLP as a universal interpreter for black-box models, with extensive experiments validating its effectiveness.

Contribution

It presents a novel approximation perspective on interpretability and introduces MLP as a universal tool for explaining various neural network models.

Findings

01

Effective explanation of neural networks using approximation theory

02

MLP successfully interprets diverse black-box models

03

Experimental results demonstrate the approach's robustness

Abstract

The lack of interpretability has hindered the large-scale adoption of AI technologies. However, the fundamental idea of interpretability, as well as how to put it into practice, remains unclear. We provide notions of interpretability based on approximation theory in this study. We first implement this approximation interpretation on a specific model (fully connected neural network) and then propose to use MLP as a universal interpreter to explain arbitrary black-box models. Extensive experiments demonstrate the effectiveness of our approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification