EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs

Benjamin Kubwimana; Qijing Huang

arXiv:2511.01866·cs.DC·November 5, 2025

EdgeReasoning: Characterizing Reasoning LLM Deployment on Edge GPUs

Benjamin Kubwimana, Qijing Huang

PDF

Open Access

TL;DR

EdgeReasoning provides a comprehensive analysis of deploying reasoning large language models on edge GPUs, balancing latency, accuracy, and resource constraints to guide optimal deployment strategies.

Contribution

It systematically characterizes latency-accuracy tradeoffs and evaluates techniques for optimizing reasoning LLM deployment on edge GPUs, filling a guidance gap.

Findings

01

Mapped the Pareto frontier of accuracy and latency configurations.

02

Evaluated prompt and tuning techniques for token reduction.

03

Profiled test-time scaling methods for latency optimization.

Abstract

Edge intelligence paradigm is increasingly demanded by the emerging autonomous systems, such as robotics. Beyond ensuring privacy-preserving operation and resilience in connectivity-limited environments, edge deployment offers significant energy and cost advantages over cloud-based solutions. However, deploying large language models (LLMs) for reasoning tasks on edge GPUs faces critical challenges from strict latency constraints and limited computational resources. To navigate these constraints, developers must balance multiple design factors - choosing reasoning versus non-reasoning architectures, selecting appropriate model sizes, allocating token budgets, and applying test-time scaling strategies - to meet target latency and optimize accuracy. Yet guidance on optimal combinations of these variables remains scarce. In this work, we present EdgeReasoning, a comprehensive study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Big Data and Digital Economy · Ferroelectric and Negative Capacitance Devices