Kernel Based Progressive Distillation for Adder Neural Networks
Yixing Xu, Chang Xu, Xinghao Chen, Wei Zhang, Chunjing Xu, Yunhe Wang

TL;DR
This paper introduces a kernel-based progressive distillation method to enhance the accuracy of Adder Neural Networks, which are energy-efficient but typically less accurate than traditional CNNs, by leveraging a teacher-student training approach in a transformed feature space.
Contribution
The paper proposes a novel kernel-based progressive knowledge distillation technique to improve ANN accuracy without increasing parameters, addressing optimization challenges in adder networks.
Findings
ANN-50 with PKKD achieves 76.8% top-1 accuracy on ImageNet.
PKKD outperforms standard training methods for ANNs.
The method effectively reduces the accuracy gap between ANNs and CNNs.
Abstract
Adder Neural Networks (ANNs) which only contain additions bring us a new way of developing deep neural networks with low energy consumption. Unfortunately, there is an accuracy drop when replacing all convolution filters by adder filters. The main reason here is the optimization difficulty of ANNs using -norm, in which the estimation of gradient in back propagation is inaccurate. In this paper, we present a novel method for further improving the performance of ANNs without increasing the trainable parameters via a progressive kernel based knowledge distillation (PKKD) method. A convolutional neural network (CNN) with the same architecture is simultaneously initialized and trained as a teacher network, features and weights of ANN and CNN will be transformed to a new space to eliminate the accuracy drop. The similarity is conducted in a higher-dimensional space to disentangle the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Neural Networks and Applications
MethodsKnowledge Distillation · Convolution
