The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks
Zhonghao Lyu, Ming Xiao, Jie Xu, Mikael Skoglund, Marco Di Renzo

TL;DR
This paper proposes a pruning-aware LAIM co-inference scheme for wireless edge networks, optimizing model partitioning and resource management to balance inference accuracy, latency, and energy efficiency.
Contribution
It introduces an analytical framework linking pruning ratio to output distortion and develops an optimization algorithm for resource allocation in edge-based LAIM inference.
Findings
Parameter distortion reliably bounds output distortion.
Joint pruning and resource management improve performance over benchmarks.
Split point selection is critical in resource-limited edge environments.
Abstract
The growing demand for large artificial intelligence model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications. In particular, edge-device co-inference, which partitions LAIMs between edge devices and servers, has emerged as a promising strategy for resource-efficient LAIM execution in wireless networks. In this paper, we investigate a pruning-aware LAIM co-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment. For analysis, we first prove that the LAIM output distortion is upper bounded by its parameter distortion. Then, we derive a lower bound on parameter distortion via rate-distortion theory, analytically capturing the relationship between pruning ratio and co-inference performance. Next, based on the analytical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Energy Efficient Wireless Sensor Networks
MethodsPruning
