An Evaluation of LLMs Inference on Popular Single-board Computers
Tung (Thomas) Nguyen, Tuyen Nguyen

TL;DR
This paper benchmarks 25 quantized open-source LLMs on popular single-board computers, analyzing performance, power, and memory to guide efficient edge deployment of language models.
Contribution
It provides the first comprehensive evaluation of LLM inference on SBCs, comparing runtimes and hardware limitations for models up to 1.5B parameters.
Findings
LBCs support models up to 1.5B parameters
Llamafile outperforms Ollama in throughput and power efficiency
Identifies architecture-specific bottlenecks and deployment trade-offs
Abstract
The growing demand for on-device large language model (LLM) inference is driving interest in deploying lightweight, cost-effective AI solutions on edge hardware. Single-board computers (SBCs) such as the Raspberry Pi and Orange Pi offer a promising platform for localized, privacy-preserving inference-but remain underexplored in the context of LLM workloads. In this work, we benchmark the performance of 25 quantized open-source LLMs across three SBCs-Raspberry Pi 4, Raspberry Pi 5, and Orange Pi 5 Pro-using two inference runtimes: Ollama and Llamafile. We evaluate generation throughput, memory usage, and power consumption under varying CPU configurations, using multiple prompt types to simulate realistic workloads. Our results show that SBCs can reliably support models up to 1.5B parameters, with Llamafile achieving up to 4x higher throughput and 30-40% lower power usage than Ollama. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
