Priority-Aware Model-Distributed Inference at Edge Networks
Teng Li, Hulya Seferoglu

TL;DR
This paper introduces PA-MDI, a priority-aware model-distributed inference framework for edge networks that optimizes model allocation based on source importance, reducing inference time.
Contribution
It formulates a priority-aware model allocation problem and proposes a practical algorithm for distributed inference considering source importance.
Findings
PA-MDI effectively allocates models based on source priority.
Experimental results show reduced inference time compared to baselines.
Validated on real edge devices and testbeds with various models.
Abstract
Distributed inference techniques can be broadly classified into data-distributed and model-distributed schemes. In data-distributed inference (DDI), each worker carries the entire Machine Learning (ML) model but processes only a subset of the data. However, feeding the data to workers results in high communication costs, especially when the data is large. An emerging paradigm is model-distributed inference (MDI), where each worker carries only a subset of ML layers. In MDI, a source device that has data processes a few layers of ML model and sends the output to a neighboring device, i.e., offloads the rest of the layers. This process ends when all layers are processed in a distributed manner. In this paper, we investigate the design and development of MDI when multiple data sources co-exist. We consider that each data source has a different importance and, hence, a priority. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Bayesian Modeling and Causal Inference
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Residual Connection · Linear Layer · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Weight Decay · Softmax · Attention Dropout
