AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution
Yiwei Zhao, Yi Zheng, Huapeng Su, Jieyu Lin, Stefano Ambrogio, Cijo Jose, Michael Ramamonjisoa, Patrick Labatut, Barbara De Salvo, Chiao Liu, Phillip B. Gibbons, Ziyun Li

TL;DR
AdaVFM introduces an adaptive framework that dynamically adjusts vision foundation models on edge devices using neural architecture search and LLM guidance, balancing accuracy and efficiency.
Contribution
It presents a runtime-adaptive execution strategy for vision models, integrating NAS and LLMs for efficient on-device inference under diverse conditions.
Findings
Achieves up to 7.9% higher accuracy on IN1K
Surpasses prior models by 5.2% mIoU on ADE20K
Reduces FLOPs by up to 77.9% for similar accuracy
Abstract
Language-aligned vision foundation models (VFMs) enable versatile visual understanding for always-on contextual AI, but their deployment on edge devices is hindered by strict latency and power constraints. We present AdaVFM, an adaptive framework for efficient on-device inference of language-aligned VFMs that dynamically adjusts computation based on scene context and task complexity. Our key insight is that the effect of model size reduction on performance is task-dependent in vision applications, motivating a runtime-adaptive execution strategy. AdaVFM integrates neural architecture search (NAS) into the language-aligned VFM backbone to enable lightweight subnet execution during runtime. A multimodal large language model (LLM) deployed on the cloud enables runtime control with a context-aware agent. This synergy allows efficient model adaptation under diverse conditions while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
