
TL;DR
This paper introduces SideNet, a small auxiliary classifier attached to a large neural network, enabling adaptive computation by early classification for easy inputs, reducing overall computational cost with minimal performance loss.
Contribution
The paper proposes a novel adaptive computation method using SideNet, a small classifier that decides whether to classify early or pass to the main network, improving efficiency.
Findings
Substantial compute reduction with minimal accuracy loss on image and text tasks
SideNet classifications are well-calibrated and reliable
Complementary to existing compute reduction techniques
Abstract
As the performance and popularity of deep neural networks has increased, so too has their computational cost. There are many effective techniques for reducing a network's computational footprint (quantisation, pruning, knowledge distillation), but these lead to models whose computational cost is the same regardless of their input. Our human reaction times vary with the complexity of the tasks we perform: easier tasks (e.g. telling apart dogs from boat) are executed much faster than harder ones (e.g. telling apart two similar looking breeds of dogs). Driven by this observation, we develop a method for adaptive network complexity by attaching a small classification layer, which we call SideNet, to a large pretrained network, which we call MainNet. Given an input, the SideNet returns a classification if its confidence level, obtained via softmax, surpasses a user determined threshold, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
MethodsLinear Layer · Convolution · Residual Block · Dense Connections · Weight Decay · Average Pooling · WordPiece · 1x1 Convolution · Residual Connection · Global Average Pooling
