Collaborative Edge-to-Server Inference for Vision-Language Models

Soochang Song; Yongjune Kim

arXiv:2512.16349·cs.CV·December 19, 2025

Collaborative Edge-to-Server Inference for Vision-Language Models

Soochang Song, Yongjune Kim

PDF

Open Access

TL;DR

This paper introduces a collaborative edge-to-server inference framework for vision-language models that minimizes communication by selectively retransmitting high-detail regions, preserving accuracy while reducing data transfer.

Contribution

It presents a novel two-stage inference approach that uses internal attention and confidence measures to selectively retransmit image regions, improving efficiency in vision-language model deployment.

Findings

01

Significantly reduces communication cost in VLM inference.

02

Maintains high inference accuracy with selective retransmission.

03

Effective across multiple VLM architectures.

Abstract

We propose a collaborative edge-to-server inference framework for vision-language models (VLMs) that reduces the communication cost while maintaining inference accuracy. In typical deployments, visual data captured at edge devices (clients) is transmitted to the server for VLM inference. However, resizing the original image (global image) to match the vision encoder's input resolution often discards fine-grained details, leading to accuracy degradation. To overcome this limitation, we design a two-stage framework. In the first stage, the server performs inference on the global image and identifies a region of interest (RoI) using the VLM's internal attention. The min-entropy of the output tokens is then computed as a confidence measure to determine whether retransmission is required. If the min-entropy exceeds a predefined threshold, the server requests the edge device to send a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning