Interpretable Disentanglement of Neural Networks by Extracting   Class-Specific Subnetwork

Yulong Wang; Xiaolin Hu; Hang Su

arXiv:1910.02673·cs.LG·October 8, 2019·1 cites

Interpretable Disentanglement of Neural Networks by Extracting Class-Specific Subnetwork

Yulong Wang, Xiaolin Hu, Hang Su

PDF

Open Access

TL;DR

This paper introduces a method to extract class-specific subnetworks from neural networks, enhancing interpretability and performance in explanation and adversarial detection tasks without sacrificing accuracy.

Contribution

It presents a novel approach to disentangle neural networks into class-specific subnetworks that are interpretable and maintain prediction performance.

Findings

01

Extracted subnetworks resemble class semantic similarities.

02

Improved explanation saliency accuracy in visual explanations.

03

Enhanced adversarial example detection rate.

Abstract

We propose a novel perspective to understand deep neural networks in an interpretable disentanglement form. For each semantic class, we extract a class-specific functional subnetwork from the original full model, with compressed structure while maintaining comparable prediction performance. The structure representations of extracted subnetworks display a resemblance to their corresponding class semantic similarities. We also apply extracted subnetworks in visual explanation and adversarial example detection tasks by merely replacing the original full model with class-specific subnetworks. Experiments demonstrate that this intuitive operation can effectively improve explanation saliency accuracy for gradient-based explanation methods, and increase the detection rate for confidence score-based adversarial example detection methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications