Dynamically Sacrificing Accuracy for Reduced Computation: Cascaded   Inference Based on Softmax Confidence

Konstantin Berestizshevsky; Guy Even

arXiv:1805.10982·cs.LG·November 12, 2020

Dynamically Sacrificing Accuracy for Reduced Computation: Cascaded Inference Based on Softmax Confidence

Konstantin Berestizshevsky, Guy Even

PDF

1 Repo

TL;DR

This paper introduces a method for reducing neural network computation by dynamically terminating inference based on softmax confidence thresholds, balancing accuracy loss and efficiency without retraining.

Contribution

It proposes a novel cascade inference approach using softmax confidence to adaptively trade accuracy for reduced computation in pre-trained models.

Findings

01

Achieves 15%-50% reduction in MAC operations

02

Degrades accuracy by approximately 1%

03

Can be applied to existing architectures like ResNet

Abstract

We study the tradeoff between computational effort and classification accuracy in a cascade of deep neural networks. During inference, the user sets the acceptable accuracy degradation which then automatically determines confidence thresholds for the intermediate classifiers. As soon as the confidence threshold is met, inference terminates immediately without having to compute the output of the complete network. Confidence levels are derived directly from the softmax outputs of intermediate classifiers, as we do not train special decision functions. We show that using a softmax output as a confidence measure in a cascade of deep neural networks leads to a reduction of 15%-50% in the number of MAC operations while degrading the classification accuracy by roughly 1%. Our method can be easily incorporated into pre-trained non-cascaded architectures, as we exemplify on ResNet. Our main…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AnonymousConferenceCode/Cascaded_Inference
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection