A Systematic Characterization of LLM Inference on GPUs

Haonan Wang; Xuxin Xiao; Mingyu Yan; Zhuoyuan Zhu; Dengke Han; Duo Wang; Wenming Li; Xiaochun Ye; Cunchen Hu; Hongyang Chen; Guangyu Sun

arXiv:2512.01644·cs.AR·December 2, 2025

A Systematic Characterization of LLM Inference on GPUs

Haonan Wang, Xuxin Xiao, Mingyu Yan, Zhuoyuan Zhu, Dengke Han, Duo Wang, Wenming Li, Xiaochun Ye, Cunchen Hu, Hongyang Chen, Guangyu Sun

PDF

Open Access

TL;DR

This paper systematically analyzes LLM inference on GPUs, establishing an analytical framework that uncovers hardware causes, system scaling principles, and emerging paradigms, providing empirical insights and optimization guidance.

Contribution

It introduces a comprehensive four-dimensional framework for understanding LLM inference on GPUs, combining empirical observations with hardware analysis and future paradigm exploration.

Findings

01

Identifies performance phenomena in LLM inference

02

Reveals hardware root causes affecting performance

03

Provides practical optimization strategies

Abstract

This work presents a systematic characterization of Large Language Model (LLM) inference to address fragmented understanding. Through comprehensive experiments, we establish a four-dimensional analytical framework: (1) Two-Phase Heterogeneity Observation; (2) Microarchitectural Root Cause Analysis; (3) System Scaling Principles; and (4) Emerging Paradigm Boundaries. Our investigation progresses systematically from observation to foresight: identifying performance phenomena, revealing hardware causes, validating system behavior, and exploring new paradigms. This study not only consolidates a reliable empirical foundation for existing research but also provides new discoveries and practical optimization guidance for LLM inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Big Data and Digital Economy · Advanced Neural Network Applications