Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends
Pablo Prieto, Pablo Abad

TL;DR
This paper compares CPU, GPU, and NPU hardware for running small language models at the edge, highlighting that NPUs offer the best performance and energy efficiency for resource-constrained environments.
Contribution
It provides a comprehensive evaluation of different hardware backends for SLM inference, demonstrating the superiority of NPUs in performance and energy efficiency.
Findings
NPUs outperform CPUs and GPUs in inference speed and energy efficiency.
Bandwidth normalization is crucial for fair cross-architecture comparison.
NPUs are the most suitable hardware for edge SLM deployment.
Abstract
Edge computing processes data where it is generated, enabling faster decisions, lower bandwidth usage, and improved privacy. However, edge devices typically operate under strict constraints on processing power, memory, and energy consumption, making them unsuitable for large language models (LLMs). Fortunately, Small Language Models (SLMs) offer lightweight alternatives that bring AI inference to resource-constrained environments by significantly reducing computational cost while remaining suitable for specialization and customization. In this scenario, selecting the hardware platform that best balances performance and efficiency for SLM inference is challenging due to strict resource limitations. To address this issue, this study evaluates the inference performance and energy efficiency of commercial CPUs (Intel and ARM), GPUs (NVIDIA), and NPUs (RaiderChip) for running SLMs. GPUs, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques
