EdgeFM: Efficient Edge Inference for Vision-Language Models
Mengling Deng, Yuanpeng Chen, Sheng Yang, Wei Tao, Wenhai Zhang, Hui Song, Linyuanhao Qin, Kai Zhao, Xiaojun Ye, Shanhui Mo, Jingli Fan, Shuang Zhang, Bei Liu, Tiankun Zhao, Xiangjing An

TL;DR
EdgeFM is a lightweight, cross-platform inference framework for vision-language models that optimizes performance and reduces latency on edge devices, surpassing traditional vendor-specific solutions.
Contribution
It introduces a modular, agent-driven approach to generate optimized kernels, enabling better cross-platform deployment and performance for industrial edge applications.
Findings
Achieves up to 1.49x speedup over TensorRT-Edge-LLM on NVIDIA Orin.
Supports mainstream platforms including x86 and NVIDIA Orin SoCs.
Provides an open-source, production-grade solution for edge industrial scenarios.
Abstract
Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency and stable execution under resource limitations. Existing frameworks either rely on bloated general-purpose designs or force developers into opaque, hardware-specific closed-source ecosystems, leading to hardware lock-in limitation and poor cross-platform adaptability. Observing that modern AI agents can efficiently search and tune configurations to generate highly optimized low-level kernels for standard LLM operators, we propose EdgeFM, a lightweight, agent-driven VLM/LLM inference framework tailored for cross-platform industrial edge deployment. EdgeFM removes non-essential features to reduce single-request latency, and encapsulates agent-tuned kernel optimizations as a modular library of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
