RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis

Zhen Bi; Xueshu Chen; Luoyang Sun; Yuhang Yao; Qing Shen; Jungang Lou; Cheng Deng

arXiv:2602.11506·cs.LG·March 16, 2026

RooflineBench: A Benchmarking Framework for On-Device LLMs via Roofline Analysis

Zhen Bi, Xueshu Chen, Luoyang Sun, Yuhang Yao, Qing Shen, Jungang Lou, Cheng Deng

PDF

Open Access

TL;DR

This paper introduces RooflineBench, a benchmarking framework based on the Roofline model, to evaluate and compare the performance and efficiency of on-device Large Language Models across heterogeneous hardware platforms.

Contribution

It proposes a systematic Roofline-based framework and a new metric, Relative Inference Potential, for analyzing LLM performance on resource-constrained hardware.

Findings

01

Performance and operational intensity vary with sequence length.

02

Model depth regression impacts operational intensity.

03

Structural refinements like Multi-head Latent Attention improve inference efficiency.

Abstract

The transition toward localized intelligence through Small Language Models (SLMs) has intensified the need for rigorous performance characterization on resource-constrained edge hardware. However, objectively measuring the theoretical performance ceilings of diverse architectures across heterogeneous platforms remains a formidable challenge. In this work, we propose a systematic framework based on the Roofline model that unifies architectural primitives and hardware constraints through the lens of operational intensity (OI). By defining an inference-potential region, we introduce the Relative Inference Potential as a novel metric to compare efficiency differences between Large Language Models (LLMs) on the same hardware substrate. Extensive empirical analysis across diverse compute tiers reveals that variations in performance and OI are significantly influenced by sequence length. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Parallel Computing and Optimization Techniques