Distilling Critical Paths in Convolutional Neural Networks

Fuxun Yu; Zhuwei Qin; Xiang Chen

arXiv:1811.02643·cs.CV·November 9, 2018·20 cites

Distilling Critical Paths in Convolutional Neural Networks

Fuxun Yu, Zhuwei Qin, Xiang Chen

PDF

Open Access

TL;DR

This paper analyzes the internal workings of convolutional neural networks to identify critical information pathways and introduces a distillation method that customizes and compresses models for resource-constrained deployment.

Contribution

It reveals class-specific critical paths in CNNs and proposes a distillation technique to create smaller, efficient models tailored to specific tasks.

Findings

01

Critical paths vary across classes and are highly task-specific.

02

The proposed distillation method significantly reduces model size and computation.

03

Customized models maintain high accuracy on target tasks.

Abstract

Neural network compression and acceleration are widely demanded currently due to the resource constraints on most deployment targets. In this paper, through analyzing the filter activation, gradients, and visualizing the filters' functionality in convolutional neural networks, we show that the filters in higher layers learn extremely task-specific features, which are exclusive for only a small subset of the overall tasks, or even a single class. Based on such findings, we reveal the critical paths of information flow for different classes. And by their intrinsic property of exclusiveness, we propose a critical path distillation method, which can effectively customize the convolutional neural networks to small ones with much smaller model size and less computation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning