Multi-DNN Inference of Sparse Models on Edge SoCs

Jiawei Luo; Di Wu; Simon Dobson; Blesson Varghese

arXiv:2603.09642·cs.DC·March 11, 2026

Multi-DNN Inference of Sparse Models on Edge SoCs

Jiawei Luo, Di Wu, Simon Dobson, Blesson Varghese

PDF

Open Access

TL;DR

This paper introduces SparseLoom, a system that enables efficient multi-DNN inference on edge SoCs by stitching sparse model subgraphs, significantly reducing SLO violations and improving throughput.

Contribution

It proposes model stitching for multi-DNN inference, allowing recombination of sparse model subgraphs without re-training, and demonstrates its effectiveness on edge SoCs.

Findings

01

Reduces SLO violation rates by up to 74%.

02

Improves throughput by up to 2.31x.

03

Lowers memory overhead by an average of 28%.

Abstract

Modern edge applications increasingly require multi-DNN inference systems to execute tasks on heterogeneous processors, gaining performance from both concurrent execution and from matching each model to the most suited accelerator. However, existing systems support only a single model (or a few sparse variants) per task, which impedes the efficiency of this matching and results in high Service Level Objective violation rates. We introduce model stitching for multi-DNN inference systems, which creates model variants by recombining subgraphs from sparse models without re-training. We present a demonstrator system, SparseLoom, that shows model stitching can be deployed to SoCs. We show experimentally that SparseLoom reduces SLO violation rates by up to 74%, improves throughput by up to 2.31x, and lowers memory overhead by an average of 28% compared to state-of-the-art multi-DNN inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing