Efficient training of lightweight neural networks using Online   Self-Acquired Knowledge Distillation

Maria Tzelepi; Anastasios Tefas

arXiv:2108.11798·cs.CV·August 27, 2021

Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation

Maria Tzelepi, Anastasios Tefas

PDF

TL;DR

This paper introduces Online Self-Acquired Knowledge Distillation (OSAKD), a method that enhances lightweight neural network training by estimating class probabilities directly in feature space, reducing computational costs.

Contribution

The paper proposes a novel online knowledge distillation approach using non-parametric density estimation to improve model performance efficiently.

Findings

01

Effective on four datasets

02

Reduces computational cost compared to traditional KD

03

Improves accuracy of lightweight models

Abstract

Knowledge Distillation has been established as a highly promising approach for training compact and faster models by transferring knowledge from heavyweight and powerful models. However, KD in its conventional version constitutes an enduring, computationally and memory demanding process. In this paper, Online Self-Acquired Knowledge Distillation (OSAKD) is proposed, aiming to improve the performance of any deep neural model in an online manner. We utilize k-nn non-parametric density estimation technique for estimating the unknown probability distributions of the data samples in the output feature space. This allows us for directly estimating the posterior class probabilities of the data samples, and we use them as soft labels that encode explicit information about the similarities of the data with the classes, negligibly affecting the computational cost. The experimental evaluation on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation · k-Nearest Neighbors