Quantization-Aware Collaborative Inference for Large Embodied AI Models
Zhonghao Lyu, Ming Xiao, Mikael Skoglund, Merouane Debbah, H. Vincent Poor

TL;DR
This paper proposes a quantization-aware collaborative inference method for large embodied AI models, optimizing resource use while maintaining inference quality under delay and energy constraints.
Contribution
It introduces a novel approximation for quantization distortion, derives bounds on rate-distortion, and formulates a joint optimization for bit-width and computation frequency.
Findings
The distortion approximation accurately predicts inference quality loss.
The derived bounds effectively guide resource allocation.
Joint design improves latency and energy efficiency in edge AI systems.
Abstract
Large artificial intelligence models (LAIMs) are increasingly regarded as a core intelligence engine for embodied AI applications. However, the massive parameter scale and computational demands of LAIMs pose significant challenges for resource-limited embodied agents. To address this issue, we investigate quantization-aware collaborative inference (co-inference) for embodied AI systems. First, we develop a tractable approximation for quantization-induced inference distortion. Based on this approximation, we derive lower and upper bounds on the quantization rate-inference distortion function, characterizing its dependence on LAIM statistics, including the quantization bit-width. Next, we formulate a joint quantization bit-width and computation frequency design problem under delay and energy constraints, aiming to minimize the distortion upper bound while ensuring tightness through the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Single-cell and spatial transcriptomics · Ferroelectric and Negative Capacitance Devices
