Collaborative Inference Acceleration with Non-Penetrative Tensor   Partitioning

Zhibang Liu; Chaonong Xu; Zhenjie Lv; Zhizhuo Liu; Suyu Zhao

arXiv:2501.04489·cs.DC·January 9, 2025

Collaborative Inference Acceleration with Non-Penetrative Tensor Partitioning

Zhibang Liu, Chaonong Xu, Zhenjie Lv, Zhizhuo Liu, Suyu Zhao

PDF

Open Access

TL;DR

This paper introduces Non-Penetrative Tensor Partitioning (NPTP), a novel method that reduces communication latency in collaborative DNN inference on IoT devices, significantly speeding up inference times.

Contribution

NPTP is a new fine-grained tensor partitioning technique that minimizes communication load, improving inference speed over existing methods.

Findings

01

NPTP achieves 1.44-1.68x faster inference than CoEdge.

02

Experimental validation on four DNN models shows consistent speedup.

03

NPTP effectively reduces communication overhead in collaborative inference.

Abstract

The inference of large-sized images on Internet of Things (IoT) devices is commonly hindered by limited resources, while there are often stringent latency requirements for Deep Neural Network (DNN) inference. Currently, this problem is generally addressed by collaborative inference, where the large-sized image is partitioned into multiple tiles, and each tile is assigned to an IoT device for processing. However, since significant latency will be incurred due to the communication overhead caused by tile sharing, the existing collaborative inference strategy is inefficient for convolutional computation, which is indispensable for any DNN. To reduce it, we propose Non-Penetrative Tensor Partitioning (NPTP), a fine-grained tensor partitioning method that reduces the communication latency by minimizing the communication load of tiles shared, thereby reducing inference latency. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Distributed and Parallel Computing Systems