Adaptively Robust LLM Inference Optimization under Prediction Uncertainty

Zixi Chen; Yinyu Ye; Zijie Zhou

arXiv:2508.14544·cs.LG·September 3, 2025

Adaptively Robust LLM Inference Optimization under Prediction Uncertainty

Zixi Chen, Yinyu Ye, Zijie Zhou

PDF

Open Access

TL;DR

This paper introduces adaptive algorithms for optimizing large language model inference scheduling under uncertain output lengths, significantly improving efficiency and robustness compared to conservative methods.

Contribution

It proposes a novel adaptive scheduling algorithm, $ ext{A}_{ ext{min}}$, that dynamically refines output length estimates, achieving near-optimal performance with only lower bound predictions.

Findings

01

$ ext{A}_{ ext{min}}$ achieves a log-scale competitive ratio.

02

Numerical simulations show $ ext{A}_{ ext{min}}$ performs close to the hindsight scheduler.

03

The approach improves scheduling robustness under prediction uncertainty.

Abstract

We study the problem of optimizing Large Language Model (LLM) inference scheduling to minimize total latency. LLM inference is an online and multi-task service process and also heavily energy consuming by which a pre-trained LLM processes input requests and generates output tokens sequentially. Therefore, it is vital to improve its scheduling efficiency and reduce the power consumption while a great amount of prompt requests are arriving. A key challenge in LLM inference scheduling is that while the prompt length is known upon arrival, the output length, which critically impacts memory usage and processing time, is unknown. To address this uncertainty, we propose algorithms that leverage machine learning to predict output lengths, assuming the prediction provides an interval classification (min-max range) for each request. We first design a conservative algorithm,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Natural Language Processing Techniques · Machine Learning in Materials Science