EdgeFM: Efficient Edge Inference for Vision-Language Models

Mengling Deng; Yuanpeng Chen; Sheng Yang; Wei Tao; Wenhai Zhang; Hui Song; Linyuanhao Qin; Kai Zhao; Xiaojun Ye; Shanhui Mo; Jingli Fan; Shuang Zhang; Bei Liu; Tiankun Zhao; Xiangjing An

arXiv:2604.27476·cs.CV·May 1, 2026

EdgeFM: Efficient Edge Inference for Vision-Language Models

Mengling Deng, Yuanpeng Chen, Sheng Yang, Wei Tao, Wenhai Zhang, Hui Song, Linyuanhao Qin, Kai Zhao, Xiaojun Ye, Shanhui Mo, Jingli Fan, Shuang Zhang, Bei Liu, Tiankun Zhao, Xiangjing An

PDF

TL;DR

EdgeFM is a lightweight, cross-platform inference framework for vision-language models that optimizes performance and reduces latency on edge devices, surpassing traditional vendor-specific solutions.

Contribution

It introduces a modular, agent-driven approach to generate optimized kernels, enabling better cross-platform deployment and performance for industrial edge applications.

Findings

01

Achieves up to 1.49x speedup over TensorRT-Edge-LLM on NVIDIA Orin.

02

Supports mainstream platforms including x86 and NVIDIA Orin SoCs.

03

Provides an open-source, production-grade solution for edge industrial scenarios.

Abstract

Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency and stable execution under resource limitations. Existing frameworks either rely on bloated general-purpose designs or force developers into opaque, hardware-specific closed-source ecosystems, leading to hardware lock-in limitation and poor cross-platform adaptability. Observing that modern AI agents can efficiently search and tune configurations to generate highly optimized low-level kernels for standard LLM operators, we propose EdgeFM, a lightweight, agent-driven VLM/LLM inference framework tailored for cross-platform industrial edge deployment. EdgeFM removes non-essential features to reduce single-request latency, and encapsulates agent-tuned kernel optimizations as a modular library of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.