On-Demand Multi-Task Sparsity for Efficient Large-Model Deployment on Edge Devices
Lianming Huang, Haibo Hu, Qiao Li, Nan Guan, Chun Jason Xue

TL;DR
This paper presents an on-demand multi-task sparsity framework that reduces task switching latency in large models on edge devices by maximizing parameter reuse and dynamically loading minimal parameter subsets.
Contribution
It introduces a novel sparsity approach that aligns sparse structures across tasks and loads only necessary parameter blocks, significantly improving switching efficiency.
Findings
Achieves over 6.6X faster task switching compared to existing methods.
Reduces cold-start latency in multi-task large-model deployment.
Demonstrates effectiveness on autonomous driving platform.
Abstract
Sparsity is essential for deploying large models on resource constrained edge platforms. However, optimizing sparsity patterns for individual tasks in isolation ignores the significant I/O overhead incurred during frequent task switching. We introduce an on-demand multi-task sparsity framework specifically designed to minimize switching costs by maximizing parameter reuse. Unlike monolithic approaches, we decompose weights into reusable block-granular units and align sparse structures across tasks to maximize overlap. By dynamically loading only the small differential set of blocks required for the next task, our method effectively mitigates the cold-start latency inherent in traditional monolithic approaches.Experiments on a real-world autonomous driving platform demonstrate that our framework achieves superior switching efficiency, accelerating task switching by over 6.6X on average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
