Kernel Based Progressive Distillation for Adder Neural Networks

Yixing Xu; Chang Xu; Xinghao Chen; Wei Zhang; Chunjing Xu; Yunhe Wang

arXiv:2009.13044·cs.CV·October 16, 2020·27 cites

Kernel Based Progressive Distillation for Adder Neural Networks

Yixing Xu, Chang Xu, Xinghao Chen, Wei Zhang, Chunjing Xu, Yunhe Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a kernel-based progressive distillation method to enhance the accuracy of Adder Neural Networks, which are energy-efficient but typically less accurate than traditional CNNs, by leveraging a teacher-student training approach in a transformed feature space.

Contribution

The paper proposes a novel kernel-based progressive knowledge distillation technique to improve ANN accuracy without increasing parameters, addressing optimization challenges in adder networks.

Findings

01

ANN-50 with PKKD achieves 76.8% top-1 accuracy on ImageNet.

02

PKKD outperforms standard training methods for ANNs.

03

The method effectively reduces the accuracy gap between ANNs and CNNs.

Abstract

Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption. Unfortunately, there is an accuracy drop when replacing all convolution filters by adder filters. The main reason here is the optimization difficulty of ANNs using $ℓ_{1}$ -norm, in which the estimation of gradient in back propagation is inaccurate. In this paper, we present a novel method for further improving the performance of ANNs without increasing the trainable parameters via a progressive kernel based knowledge distillation (PKKD) method. A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop. The similarity is conducted in a higher-dimensional space to disentangle the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Kernel Based Progressive Distillation for Adder Neural Networks· slideslive

Taxonomy

TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Neural Networks and Applications

MethodsKnowledge Distillation · Convolution