MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

Xingze Zou; Jing Wang; Yuhua Zheng; Xueyi Chen; Haolei Bai; Lingcheng Kong; Syed A.R. Abu-Bakar; Zhaode Wang; Chengfei Lv; Haoji Hu; Huan Wang

arXiv:2603.11935·cs.LG·March 17, 2026

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

Xingze Zou, Jing Wang, Yuhua Zheng, Xueyi Chen, Haolei Bai, Lingcheng Kong, Syed A.R. Abu-Bakar, Zhaode Wang, Chengfei Lv, Haoji Hu, Huan Wang

PDF

Open Access

TL;DR

This paper evaluates the ability of large language models to generate efficient kernels for mobile devices, introduces a benchmark and framework for testing, and proposes a multi-agent system that significantly improves kernel compilation success and performance.

Contribution

It introduces MobileKernelBench, a comprehensive evaluation framework, and proposes MoKA, a multi-agent system that enhances kernel generation success and efficiency for mobile devices.

Findings

01

Current LLMs have high failure rates (>54%) in mobile kernel generation.

02

MoKA improves compilation success to 93.7%.

03

27.4% of generated kernels show measurable speedups.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet their potential for generating kernels specifically for mobile devices remains largely unexplored. In this work, we extend the scope of automated kernel generation to the mobile domain to investigate the central question: Can LLMs write efficient kernels for mobile devices? To enable systematic investigation, we introduce MobileKernelBench, a comprehensive evaluation framework comprising a benchmark prioritizing operator diversity and cross-framework interoperability, coupled with an automated pipeline that bridges the host-device gap for on-device verification. Leveraging this framework, we conduct extensive evaluation on the CPU backend of Mobile Neural Network (MNN), revealing that current LLMs struggle with the engineering complexity and data scarcity inherent to mobile frameworks;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Topic Modeling · Software System Performance and Reliability