Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds

Lakshani Manamperi; Disumi Pathirana; Thiwanka Pathirana; Nipun Premarathna; Kutila Gunasekera

arXiv:2605.20723·cs.LG·May 21, 2026

Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds

Lakshani Manamperi, Disumi Pathirana, Thiwanka Pathirana, Nipun Premarathna, Kutila Gunasekera

PDF

TL;DR

This paper introduces CROWDio, a system for efficient DNN inference on resource-limited Android devices by distributing memory load without modifying models, enabling practical edge ML deployment.

Contribution

CROWDio's novel scheduling mechanisms enable large DNN inference across multiple Android devices without model changes, reducing memory and energy usage.

Findings

01

Achieves peak per-device RSS of 43 MB on DistilBERT

02

Limits battery draw to 50 mAh per run

03

Reduces batch latency by 34% with streaming concurrency

Abstract

Deploying large deep neural networks on memory-constrained mobile devices is a central challenge in edge ML. While compression, pruning, and quantization reduce per-parameter cost, transformer-based models remain too large for the 3.3-7.4 GB RAM envelope of commodity Android handsets. We present the DNN pipeline scheduling subsystem of CROWDio, which achieves practical ONNX inference across resource-constrained Android workers without model modification, by distributing memory pressure across devices via five mechanisms: JIT deferred partition loading, a single-partition-resident constraint, a 4-tier affinity scheduler, a zlib-compressed tensor transport, and a streaming 1:1 dependency model. Evaluated on DistilBERT (Sanh et al., 2019) (approximately 67 M parameters, SST-2) across five Android handsets over ten runs, our system holds peak per-device RSS to 43+-2 MB and limits battery…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.