Priority-Aware Model-Distributed Inference at Edge Networks

Teng Li; Hulya Seferoglu

arXiv:2412.12371·cs.DC·December 18, 2024

Priority-Aware Model-Distributed Inference at Edge Networks

Teng Li, Hulya Seferoglu

PDF

Open Access

TL;DR

This paper introduces PA-MDI, a priority-aware model-distributed inference framework for edge networks that optimizes model allocation based on source importance, reducing inference time.

Contribution

It formulates a priority-aware model allocation problem and proposes a practical algorithm for distributed inference considering source importance.

Findings

01

PA-MDI effectively allocates models based on source priority.

02

Experimental results show reduced inference time compared to baselines.

03

Validated on real edge devices and testbeds with various models.

Abstract

Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire Machine Learning (ML) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of ML layers. In MDI, a source device that has data processes a few layers of ML model and sends the output to a neighboring device, i.e., offloads the rest of the layers. This process ends when all layers are processed in a distributed manner. In this paper, we investigate the design and development of MDI when multiple data sources co-exist. We consider that each data source has a different importance and, hence, a priority. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Bayesian Modeling and Causal Inference

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Linear Layer · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Weight Decay · Softmax · Attention Dropout