Statistical Modeling and Uncertainty Estimation of LLM Inference Systems

Kaustabha Ray; Nelson Mimura Gonzalez; Bruno Wassermann; Rachel Tzoref-Brill; Dean H. Lorenz

arXiv:2505.09319·cs.PF·May 15, 2025

Statistical Modeling and Uncertainty Estimation of LLM Inference Systems

Kaustabha Ray, Nelson Mimura Gonzalez, Bruno Wassermann, Rachel Tzoref-Brill, Dean H. Lorenz

PDF

Open Access

TL;DR

This paper introduces the ALA framework that combines analytical models with machine learning to accurately predict performance and quantify uncertainty in large language model inference systems across diverse workloads.

Contribution

The paper presents a novel hybrid analytical-ML approach with uncertainty estimation for robust performance prediction in LLM inference workloads.

Findings

01

Achieves low median prediction errors across diverse workloads

02

Effectively extends performance predictions to unobserved configurations

03

Provides uncertainty quantification based on workload similarity

Abstract

Large Language Model (LLM) inference systems present significant challenges in statistical performance characterization due to dynamic workload variations, diverse hardware architectures, and complex interactions between model size, batch processing, and throughput requirements. Accurate statistical characterization enables better workload scheduling, adaptive resource provisioning, and cost-aware inference optimization, making it crucial for improving efficiency in large-scale AI deployments. Traditional analytical models provide explainability but cannot cover the vast diversity of real-world workloads, making it impossible to benchmark every scenario in advance. Machine learning (ML) approaches effectively predict performance for non-benchmarked cases but struggle when extrapolating beyond their observed training space. To address these limitations for LLM inference systems, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems