QoS-Nets: Adaptive Approximate Neural Network Inference

Elias Trommer; Bernd Waschneck; Akash Kumar

arXiv:2410.07762·cs.LG·October 11, 2024

QoS-Nets: Adaptive Approximate Neural Network Inference

Elias Trommer, Bernd Waschneck, Akash Kumar

PDF

Open Access

TL;DR

This paper presents QoS-Nets, a method for dynamically adjusting neural network accuracy and resource use at runtime by selecting and fine-tuning multiple approximate multiplier configurations, enabling power savings with minimal accuracy loss.

Contribution

It introduces a novel search and fine-tuning approach for multiple operating points of approximate multipliers, allowing adaptive QoS in neural networks.

Findings

01

Achieves 15.3% to 42.8% power savings with minimal accuracy loss.

02

Supports multiple runtime operating points with only 2.75% increase in parameters.

03

Demonstrates effectiveness on MobileNetV2 with adaptive approximate multipliers.

Abstract

In order to vary the arithmetic resource consumption of neural network applications at runtime, this work proposes the flexible reuse of approximate multipliers for neural network layer computations. We introduce a search algorithm that chooses an appropriate subset of approximate multipliers of a user-defined size from a larger search space and enables retraining to maximize task performance. Unlike previous work, our approach can output more than a single, static assignment of approximate multiplier instances to layers. These different operating points allow a system to gradually adapt its Quality of Service (QoS) to changing environmental conditions by increasing or decreasing its accuracy and resource consumption. QoS-Nets achieves this by reassigning the selected approximate multiplier instances to layers at runtime. To combine multiple operating points with the use of retraining,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

Methodstravel james · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Batch Normalization · 1x1 Convolution · Inverted Residual Block · Convolution · Average Pooling