TL;DR
This paper introduces a progressive semantic communication framework for edge-cloud vision-language models, enabling adaptive, bandwidth-efficient inference with maintained semantic fidelity on resource-constrained devices.
Contribution
It proposes a Meta AutoEncoder-based adaptive compression scheme that allows flexible, plug-and-play deployment of VLMs without fine-tuning, optimizing communication under bandwidth constraints.
Findings
Significant latency reduction at 1 Mbps uplink compared to baseline methods.
High semantic consistency maintained under high compression levels.
Effective deployment on embedded platforms with end-to-end system demonstrated.
Abstract
Deploying Vision-Language Models (VLMs) on edge devices remains challenging due to their substantial computational and memory demands, which exceed the capabilities of resource-constrained embedded platforms. Conversely, fully offloading inference to the cloud is often impractical in bandwidth-limited environments, where transmitting raw visual data introduces substantial latency overhead. While recent edge-cloud collaborative architectures attempt to partition VLM workloads across devices, they typically rely on transmitting fixed-size representations, lacking adaptability to dynamic network conditions and failing to fully exploit semantic redundancy. In this paper, we propose a progressive semantic communication framework for edge-cloud VLM inference, using a Meta AutoEncoder that compresses visual tokens into adaptive, progressively refinable representations, enabling plug-and-play…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
