Toward Sustainability-Aware LLM Inference on Edge Clusters
Kolichala Rajashekar, Nafiseh Sharghivand, Radu Prodan, Reza Farahani

TL;DR
This paper explores sustainability-aware inference strategies for large language models on edge clusters, balancing latency and carbon footprint through empirical benchmarking and optimized prompt routing.
Contribution
It introduces a novel approach to LLM inference on edge devices by combining empirical energy benchmarking with routing strategies that optimize for sustainability and performance.
Findings
Batch size of four prompts balances throughput and energy efficiency.
Larger batches risk GPU memory saturation.
Carbon- and latency-aware routing improves sustainability and performance.
Abstract
Large language models (LLMs) require substantial computational resources, leading to significant carbon emissions and operational costs. Although training is energy-intensive, the long-term environmental burden arises from inference, amplified by the massive global query volume. Cloud-based inference offers scalability but suffers from latency and bandwidth constraints due to centralized processing and continuous data transfer. Edge clusters instead can mitigate these limitations by enabling localized execution, yet they face trade-offs between performance, energy efficiency, and device constraints. This short paper presents a sustainability-aware LLM inference for edge clusters comprising NVIDIA Jetson Orin NX (8GB) and Nvidia Ada 2000 (16GB) devices. It aims to balance inference latency and carbon footprint through carbon- and latency-aware routing strategies, guided by empirical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Machine Learning in Materials Science
