QoS-Nets: Adaptive Approximate Neural Network Inference
Elias Trommer, Bernd Waschneck, Akash Kumar

TL;DR
This paper presents QoS-Nets, a method for dynamically adjusting neural network accuracy and resource use at runtime by selecting and fine-tuning multiple approximate multiplier configurations, enabling power savings with minimal accuracy loss.
Contribution
It introduces a novel search and fine-tuning approach for multiple operating points of approximate multipliers, allowing adaptive QoS in neural networks.
Findings
Achieves 15.3% to 42.8% power savings with minimal accuracy loss.
Supports multiple runtime operating points with only 2.75% increase in parameters.
Demonstrates effectiveness on MobileNetV2 with adaptive approximate multipliers.
Abstract
In order to vary the arithmetic resource consumption of neural network applications at runtime, this work proposes the flexible reuse of approximate multipliers for neural network layer computations. We introduce a search algorithm that chooses an appropriate subset of approximate multipliers of a user-defined size from a larger search space and enables retraining to maximize task performance. Unlike previous work, our approach can output more than a single, static assignment of approximate multiplier instances to layers. These different operating points allow a system to gradually adapt its Quality of Service (QoS) to changing environmental conditions by increasing or decreasing its accuracy and resource consumption. QoS-Nets achieves this by reassigning the selected approximate multiplier instances to layers at runtime. To combine multiple operating points with the use of retraining,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
Methodstravel james · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Batch Normalization · 1x1 Convolution · Inverted Residual Block · Convolution · Average Pooling
