Inference performance evaluation for LLMs on edge devices with a novel benchmarking framework and metric

Hao Chen; Cong Tian; Zixuan He; Bin Yu; Yepang Liu; Jialun Cao

arXiv:2508.11269·cs.PF·August 18, 2025

Inference performance evaluation for LLMs on edge devices with a novel benchmarking framework and metric

Hao Chen, Cong Tian, Zixuan He, Bin Yu, Yepang Liu, Jialun Cao

PDF

TL;DR

This paper introduces ELIB, a benchmarking framework and MBU metric for evaluating and optimizing LLM inference performance on edge devices, addressing hardware variability and memory constraints.

Contribution

It presents a novel benchmarking tool and metric for assessing and improving LLM inference efficiency on diverse edge hardware platforms.

Findings

01

ELIB effectively benchmarks LLM inference across different edge devices.

02

MBU metric helps optimize memory bandwidth utilization for LLM deployment.

03

Analysis reveals key factors affecting LLM performance on edge hardware.

Abstract

With the significant success achieved by large language models (LLMs) like LLaMA, edge computing-based LLM inference services for mobile and PC are in high demand for data privacy. However, different edge platforms have different hardware characteristics and the large demand for memory capacity and bandwidth makes it very challenging to deploy and benchmark LLMs on edge devices. In this paper, we introduce a benchmarking tool named ELIB (edge LLM inference benchmarking) to evaluate LLM inference performance of different edge platforms, and propose a novel metric named MBU to indicate the percentage of the theoretically efficient use of available memory bandwidth for a specific model running on edge hardware to optimize memory usage. We deploy ELIB on three edge platforms and benchmark using five quantized models to optimize MBU in combination with other metrics such as FLOPS,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.