SparOA: Sparse and Operator-aware Hybrid Scheduling for Edge DNN Inference
Ziyang Zhang, Jie Liu, Luca Mottola

TL;DR
SparOA is a hybrid CPU-GPU inference framework that optimizes DNN performance on edge devices by leveraging sparsity and operator characteristics, using reinforcement learning for dynamic scheduling.
Contribution
The paper introduces SparOA, a novel hybrid inference framework that combines sparsity, operator-awareness, and reinforcement learning for optimized edge DNN inference.
Findings
Achieves 1.22-1.31x speedup over baselines.
Outperforms CPU-only by up to 50.7x.
Reduces energy consumption by 7-16%.
Abstract
The resource demands of deep neural network (DNN) models introduce significant performance challenges, especially when deployed on resource-constrained edge devices. Existing solutions like model compression often sacrifice accuracy, while specialized hardware remains costly and inflexible. Hybrid inference methods, however, typically overlook how operator characteristics impact performance. In this work, we present SparOA, a CPU-GPU hybrid inference framework, which leverages both sparsity and computational intensity to optimize operator scheduling. SparOA embraces aforementioned challenges through three key components: (1) a threshold predictor that accurately determines optimal sparsity and computational intensity thresholds; (2) a reinforcement learning-based scheduler that dynamically optimizes resource allocation based on real-time hardware states; and (3) a hybrid inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Big Data and Digital Economy · IoT and Edge/Fog Computing
