DAT: Dual-Aware Adaptive Transmission for Efficient Multimodal LLM Inference in Edge-Cloud Systems

Qi Guo; Zheming Yang; Yunqing Hu; Chang Zhao; and Wen Ji

arXiv:2604.05375·cs.MM·April 8, 2026

DAT: Dual-Aware Adaptive Transmission for Efficient Multimodal LLM Inference in Edge-Cloud Systems

Qi Guo, Zheming Yang, Yunqing Hu, Chang Zhao, and Wen Ji

PDF

TL;DR

DAT introduces a collaborative cascade and adaptive transmission strategy for efficient multimodal LLM inference on edge-cloud systems, significantly reducing latency and bandwidth usage while maintaining high accuracy.

Contribution

The paper proposes a novel dual-aware adaptive transmission framework with a small-large model cascade and bandwidth-aware optimization for multimodal LLMs in constrained environments.

Findings

01

Achieves 98.83% recognition accuracy and 100% output consistency.

02

Reduces weighted semantic alert delay by up to 77.5% under congestion.

03

Delivers 98.33% of visual evidence within 0.5 seconds.

Abstract

Multimodal large language models (MLLMs) have shown strong capability in semantic understanding and visual reasoning, yet their use on continuous video streams in bandwidth-constrained edge-cloud systems incurs prohibitive computation and communication overhead and hinders low-latency alerting and effective visual evidence delivery. To address this challenge, we propose DAT to achieve high-quality semantic generation, low-latency event alerting, and effective visual evidence supplementation. To reduce unnecessary deep reasoning costs, we propose a collaborative small-large model cascade. A lightweight edge-side small model acts as a gating module to filter non-target-event frames and perform object detection, triggering MLLM inference only for suspicious frames. Building on this, we introduce an efficient fine-tuning strategy with visual guidance and semantic prompting, which improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.