AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution

Yiwei Zhao; Yi Zheng; Huapeng Su; Jieyu Lin; Stefano Ambrogio; Cijo Jose; Michael Ramamonjisoa; Patrick Labatut; Barbara De Salvo; Chiao Liu; Phillip B. Gibbons; Ziyun Li

arXiv:2604.15622·cs.CV·May 5, 2026

AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution

Yiwei Zhao, Yi Zheng, Huapeng Su, Jieyu Lin, Stefano Ambrogio, Cijo Jose, Michael Ramamonjisoa, Patrick Labatut, Barbara De Salvo, Chiao Liu, Phillip B. Gibbons, Ziyun Li

PDF

TL;DR

AdaVFM introduces an adaptive framework that dynamically adjusts vision foundation models on edge devices using neural architecture search and LLM guidance, balancing accuracy and efficiency.

Contribution

It presents a runtime-adaptive execution strategy for vision models, integrating NAS and LLMs for efficient on-device inference under diverse conditions.

Findings

01

Achieves up to 7.9% higher accuracy on IN1K

02

Surpasses prior models by 5.2% mIoU on ADE20K

03

Reduces FLOPs by up to 77.9% for similar accuracy

Abstract

Language-aligned vision foundation models (VFMs) enable versatile visual understanding for always-on contextual AI, but their deployment on edge devices is hindered by strict latency and power constraints. We present AdaVFM, an adaptive framework for efficient on-device inference of language-aligned VFMs that dynamically adjusts computation based on scene context and task complexity. Our key insight is that the effect of model size reduction on performance is task-dependent in vision applications, motivating a runtime-adaptive execution strategy. AdaVFM integrates neural architecture search (NAS) into the language-aligned VFM backbone to enable lightweight subnet execution during runtime. A multimodal large language model (LLM) deployed on the cloud enables runtime control with a context-aware agent. This synergy allows efficient model adaptation under diverse conditions while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.