SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud
Stefanos Laskaridis, Stylianos I. Venieris, Mario Almeida, Ilias, Leontiadis, Nicholas D. Lane

TL;DR
SPINN is a distributed device-cloud inference system for CNNs that adapts dynamically to network conditions, improving throughput, reducing costs, and maintaining accuracy in mobile and mission-critical applications.
Contribution
It introduces a novel scheduler that co-optimizes early-exit policies and CNN splitting at runtime for robust, efficient inference across diverse conditions.
Findings
Up to 2x throughput improvement over state-of-the-art methods
Reduces server costs by up to 6.8x
Improves accuracy by 20.7% under latency constraints
Abstract
Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices. A popular alternative comprises offloading CNN processing to powerful cloud-based servers. Nevertheless, by relying on the cloud to produce outputs, emerging mission-critical and high-mobility applications, such as drone obstacle avoidance or interactive applications, can suffer from the dynamic connectivity conditions and the uncertain availability of the cloud. In this paper, we propose SPINN, a distributed inference system that employs synergistic device-cloud computation together with a progressive inference method to deliver fast and robust CNN inference across diverse settings. The proposed system introduces a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
