Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems
Chong Tang, Hao Dai, Jagmohan Chauhan

TL;DR
Parallax is a framework that improves real-time DNN inference on edge devices by optimizing parallel execution and memory management of fallback operators, reducing latency and energy consumption.
Contribution
It introduces a novel approach to parallelize and manage fallbacks in heterogeneous edge systems without requiring model modifications or custom operators.
Findings
Up to 46% latency reduction on mobile devices
Maintains controlled memory overhead (26.5% on average)
Achieves up to 30% energy savings
Abstract
The growing demand for real-time DNN applications on edge devices necessitates faster inference of increasingly complex models. Although many devices include specialized accelerators (e.g., mobile GPUs), dynamic control-flow operators and unsupported kernels often fall back to CPU execution. Existing frameworks handle these fallbacks poorly, leaving CPU cores idle and causing high latency and memory spikes. We introduce Parallax, a framework that accelerates mobile DNN inference without model refactoring or custom operator implementations. Parallax first partitions the computation DAG to expose parallelism, then employs branch-aware memory management with dedicated arenas and buffer reuse to reduce runtime footprint. An adaptive scheduler executes branches according to device memory constraints, meanwhile, fine-grained subgraph control enables heterogeneous inference of dynamic models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy
