Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution

Zhuojin Li; Marco Paolieri; Leana Golubchik

arXiv:2510.21081·cs.LG·February 20, 2026

Accelerating Mobile Inference through Fine-Grained CPU-GPU Co-Execution

Zhuojin Li, Marco Paolieri, Leana Golubchik

PDF

TL;DR

This paper introduces a method to accelerate neural network inference on mobile devices by enabling efficient CPU-GPU co-execution using lightweight synchronization and machine learning-based execution time prediction, resulting in significant speedups.

Contribution

It proposes a novel lightweight synchronization mechanism and ML models for accurate execution time prediction to optimize CPU-GPU collaborative inference on mobile devices.

Findings

01

Achieves up to 1.89x speedup for linear layers

02

Achieves up to 1.75x speedup for convolutional layers

03

Close to maximum possible speedups demonstrated on mobile platforms

Abstract

Deploying deep neural networks on mobile devices is increasingly important but remains challenging due to limited computing resources. On the other hand, their unified memory architecture and narrower gap between CPU and GPU performance provide an opportunity to reduce inference latency by assigning tasks to both CPU and GPU. The main obstacles for such collaborative execution are the significant synchronization overhead required to combine partial results, and the difficulty of predicting execution times of tasks assigned to CPU and GPU (due to the dynamic selection of implementations and parallelism level). To overcome these obstacles, we propose both a lightweight synchronization mechanism based on OpenCL fine-grained shared virtual memory (SVM) and machine learning models to accurately predict execution times. Notably, these models capture the performance characteristics of GPU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.