Distributed Collaborative Inference System in Next-Generation Networks and Communication
Chuan Zhang, Xixi Zheng, Xiaolong Tao, Chenfei Hu, Weiting Zhang and, Liehuang Zhu

TL;DR
This paper proposes a multi-level collaborative inference system for 6G networks that reduces latency and improves efficiency in generative AI tasks by deploying models across network layers and optimizing task offloading.
Contribution
It introduces a novel deployment and task offloading strategy combined with an early exit mechanism for efficient GAI inference in next-generation networks.
Findings
Reduces inference latency by up to 17%
Maintains high inference accuracy
Enhances efficiency in resource-constrained devices
Abstract
With the rapid advancement of artificial intelligence, generative artificial intelligence (GAI) has taken a leading role in transforming data processing methods. However, the high computational demands of GAI present challenges for devices with limited resources. As we move towards the sixth generation of mobile networks (6G), the higher data rates and improved energy efficiency of 6G create a need for more efficient data processing in GAI. Traditional GAI, however, shows its limitations in meeting these demands. To address these challenges, we introduce a multi-level collaborative inference system designed for next-generation networks and communication. Our proposed system features a deployment strategy that assigns models of varying sizes to devices at different network layers. Then, we design a task offloading strategy to optimise both efficiency and latency. Furthermore, a modified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Computing and Networks
