An Evaluation of LLMs Inference on Popular Single-board Computers

Tung (Thomas) Nguyen; Tuyen Nguyen

arXiv:2511.07425·cs.DC·November 12, 2025

An Evaluation of LLMs Inference on Popular Single-board Computers

Tung (Thomas) Nguyen, Tuyen Nguyen

PDF

Open Access

TL;DR

This paper benchmarks 25 quantized open-source LLMs on popular single-board computers, analyzing performance, power, and memory to guide efficient edge deployment of language models.

Contribution

It provides the first comprehensive evaluation of LLM inference on SBCs, comparing runtimes and hardware limitations for models up to 1.5B parameters.

Findings

01

LBCs support models up to 1.5B parameters

02

Llamafile outperforms Ollama in throughput and power efficiency

03

Identifies architecture-specific bottlenecks and deployment trade-offs

Abstract

The growing demand for on-device large language model (LLM) inference is driving interest in deploying lightweight, cost-effective AI solutions on edge hardware. Single-board computers (SBCs) such as the Raspberry Pi and Orange Pi offer a promising platform for localized, privacy-preserving inference-but remain underexplored in the context of LLM workloads. In this work, we benchmark the performance of 25 quantized open-source LLMs across three SBCs-Raspberry Pi 4, Raspberry Pi 5, and Orange Pi 5 Pro-using two inference runtimes: Ollama and Llamafile. We evaluate generation throughput, memory usage, and power consumption under varying CPU configurations, using multiple prompt types to simulate realistic workloads. Our results show that SBCs can reliably support models up to 1.5B parameters, with Llamafile achieving up to 4x higher throughput and 30-40% lower power usage than Ollama. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques