Characterizing and Understanding Energy Footprint and Efficiency of Small Language Model on Edges
Md Romyull Islam, Bobin Deng, Nobel Dhar, Tu N. Nguyen, Selena He, Yong Shi, Kun Suo

TL;DR
This paper evaluates the energy efficiency of small language models on edge devices, analyzing their performance, power consumption, and tradeoffs to guide deployment in energy-constrained environments.
Contribution
It provides a comprehensive empirical analysis of SLMs on edge hardware, highlighting key factors affecting energy efficiency and practical deployment insights.
Findings
Jetson Orin Nano with GPU offers highest energy-to-performance ratio
Llama 3.2 balances accuracy and power efficiency effectively
TinyLlama is suitable for low-power scenarios with reduced accuracy
Abstract
Cloud-based large language models (LLMs) and their variants have significantly influenced real-world applications. Deploying smaller models (i.e., small language models (SLMs)) on edge devices offers additional advantages, such as reduced latency and independence from network connectivity. However, edge devices' limited computing resources and constrained energy budgets challenge efficient deployment. This study evaluates the power efficiency of five representative SLMs - Llama 3.2, Phi-3 Mini, TinyLlama, and Gemma 2 on Raspberry Pi 5, Jetson Nano, and Jetson Orin Nano (CPU and GPU configurations). Results show that Jetson Orin Nano with GPU acceleration achieves the highest energy-to-performance ratio, significantly outperforming CPU-based setups. Llama 3.2 provides the best balance of accuracy and power efficiency, while TinyLlama is well-suited for low-power environments at the cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · IoT and Edge/Fog Computing · Advanced Neural Network Applications
