TL;DR
DEFER is a distributed framework that partitions deep neural networks across multiple edge devices to improve inference throughput and reduce energy consumption, demonstrated on ResNet50 with significant gains.
Contribution
This paper introduces DEFER, a novel distributed edge inference framework that partitions DNNs across multiple nodes, enhancing throughput and energy efficiency compared to single-device inference.
Findings
53% higher inference throughput with 8 nodes
63% lower energy consumption per node
Reduced network payload with compression algorithms
Abstract
Modern machine learning tools such as deep neural networks (DNNs) are playing a revolutionary role in many fields such as natural language processing, computer vision, and the internet of things. Once they are trained, deep learning models can be deployed on edge computers to perform classification and prediction on real-time data for these applications. Particularly for large models, the limited computational and memory resources on a single edge device can become the throughput bottleneck for an inference pipeline. To increase throughput and decrease per-device compute load, we present DEFER (Distributed Edge inFERence), a framework for distributed edge inference, which partitions deep neural networks into layers that can be spread across multiple compute nodes. The architecture consists of a single "dispatcher" node to distribute DNN partitions and inference data to respective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
