edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer
Chen Qian, Xinran Yu, Zewen Huang, Danyang Li, Qiang Ma, Fan Dang, Xuan Ding, Guangyong Shang, Zheng Yang

TL;DR
edgeVLM introduces a cloud-edge collaborative framework that leverages delayed LVLM outputs as context to improve real-time vision-language reasoning, addressing latency issues and enhancing accuracy.
Contribution
The paper proposes a novel Context Transfer paradigm and edgeVLM model that utilize delayed responses as historical context for improved real-time inference.
Findings
Effective in three vision-language reasoning tasks
Outperforms existing cloud-edge strategies
Enhances visual grounding consistency
Abstract
Vision-Language Models (VLMs) are increasingly deployed in real-time applications such as autonomous driving and human-computer interaction, which demand fast and reliable responses based on accurate perception. To meet these requirements, existing systems commonly employ cloud-edge collaborative architectures, such as partitioned Large Vision-Language Models (LVLMs) or task offloading strategies between Large and Small Vision-Language Models (SVLMs). However, these methods fail to accommodate cloud latency fluctuations and overlook the full potential of delayed but accurate LVLM responses. In this work, we propose a novel cloud-edge collaborative paradigm for VLMs, termed Context Transfer, which treats the delayed outputs of LVLMs as historical context to provide real-time guidance for SVLMs inference. Based on this paradigm, we design edgeVLM, which incorporates both context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical Wireless Communication Technologies · Advanced Fiber Optic Sensors · Optical Coherence Tomography Applications
