Dynamic Network Adaptation at Inference

Daniel Mendoza; Caroline Trippel

arXiv:2204.08400·cs.LG·April 19, 2022

Dynamic Network Adaptation at Inference

Daniel Mendoza, Caroline Trippel

PDF

Open Access

TL;DR

This paper introduces SLO-Aware Neural Networks that dynamically adjust computation during inference by dropping nodes, enabling real-time systems to meet diverse latency and accuracy requirements efficiently.

Contribution

It proposes a novel method for per-inference dynamic node dropout based on SLO targets, improving speed and maintaining accuracy in inference-serving systems.

Findings

01

Achieves 1.3-56.7× speedups with minimal accuracy loss

02

Serves multiple accuracy targets with a single trained model

03

Mitigates latency degradation from co-location interference

Abstract

Machine learning (ML) inference is a real-time workload that must comply with strict Service Level Objectives (SLOs), including latency and accuracy targets. Unfortunately, ensuring that SLOs are not violated in inference-serving systems is challenging due to inherent model accuracy-latency tradeoffs, SLO diversity across and within application domains, evolution of SLOs over time, unpredictable query patterns, and co-location interference. In this paper, we observe that neural networks exhibit high degrees of per-input activation sparsity during inference. . Thus, we propose SLO-Aware Neural Networks which dynamically drop out nodes per-inference query, thereby tuning the amount of computation performed, according to specified SLO optimization targets and machine utilization. SLO-Aware Neural Networks achieve average speedups of $1.3 - 56.7 \times$ with little to no accuracy loss (less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Age of Information Optimization · Brain Tumor Detection and Classification

Methodstravel james