Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models
Xiao Liu, Lijun Zhang, Deepak Ganesan, Hui Guan

TL;DR
This paper presents LLaVA-AlignedVQ, an edge-cloud collaborative vision-language system with a novel feature compression method that significantly reduces data transmission and speeds up inference while maintaining accuracy.
Contribution
It introduces AlignedVQ, a new vector quantization algorithm enabling efficient feature compression for edge-cloud VQA systems, balancing performance and resource use.
Findings
Achieves 1365x compression rate of intermediate features.
Reduces data transmission by 96.8% compared to JPEG90 images.
Speeds up inference by 2-15x with minimal accuracy loss.
Abstract
Vision Language Models (VLMs) are central to Visual Question Answering (VQA) systems and are typically deployed in the cloud due to their high computational demands. However, this cloud-only approach underutilizes edge computational resources and requires significant bandwidth for transmitting raw images. In this paper, we introduce an edge-cloud collaborative VQA system, called LLaVA-AlignedVQ, which features a novel Aligned Vector Quantization algorithm (AlignedVQ) that efficiently compress intermediate features without compromising accuracy to support partitioned execution. Our experiments demonstrate that LLaVA-AlignedVQ achieves approximately 1365x compression rate of intermediate features, reducing data transmission overhead by 96.8% compared to transmitting JPEG90-compressed images to the cloud. LLaVA-AlignedVQ achieves an inference speedup of 2-15x while maintaining high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
