Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems

Chong Tang; Hao Dai; Jagmohan Chauhan

arXiv:2512.11532·cs.DC·December 15, 2025

Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems

Chong Tang, Hao Dai, Jagmohan Chauhan

PDF

Open Access

TL;DR

Parallax is a framework that improves real-time DNN inference on edge devices by optimizing parallel execution and memory management of fallback operators, reducing latency and energy consumption.

Contribution

It introduces a novel approach to parallelize and manage fallbacks in heterogeneous edge systems without requiring model modifications or custom operators.

Findings

01

Up to 46% latency reduction on mobile devices

02

Maintains controlled memory overhead (26.5% on average)

03

Achieves up to 30% energy savings

Abstract

The growing demand for real-time DNN applications on edge devices necessitates faster inference of increasingly complex models. Although many devices include specialized accelerators (e.g., mobile GPUs), dynamic control-flow operators and unsupported kernels often fall back to CPU execution. Existing frameworks handle these fallbacks poorly, leaving CPU cores idle and causing high latency and memory spikes. We introduce Parallax, a framework that accelerates mobile DNN inference without model refactoring or custom operator implementations. Parallax first partitions the computation DAG to expose parallelism, then employs branch-aware memory management with dedicated arenas and buffer reuse to reduce runtime footprint. An adaptive scheduler executes branches according to device memory constraints, meanwhile, fine-grained subgraph control enables heterogeneous inference of dynamic models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy