Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs

Hao Kang; Qingru Zhang; Han Cai; Weiyuan Xu; Tushar Krishna; Yilun Du; Tsachy Weissman

arXiv:2505.19481·cs.LG·May 27, 2025

Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs

Hao Kang, Qingru Zhang, Han Cai, Weiyuan Xu, Tushar Krishna, Yilun Du, Tsachy Weissman

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the trade-off between speed and accuracy in latency-sensitive decisions made by large language models, introducing benchmarks and an adaptive framework to optimize performance in real-time tasks.

Contribution

It presents the first systematic study of latency-quality trade-offs in LLMs for real-time decision making, along with new benchmarks and an adaptive model selection framework.

Findings

01

Optimal latency-quality balance varies by task.

02

Sacrificing quality can significantly improve downstream performance.

03

Proposed framework outperforms baselines on benchmarks.

Abstract

Large language models (LLMs) have shown remarkable performance across diverse reasoning and generation tasks, and are increasingly deployed as agents in dynamic environments such as code generation and recommendation systems. However, many real-world applications, such as high-frequency trading and real-time competitive gaming, require decisions under strict latency constraints, where faster responses directly translate into higher rewards. Despite the importance of this latency quality trade off, it remains underexplored in the context of LLM based agents. In this work, we present the first systematic study of this trade off in real time decision making tasks. To support our investigation, we introduce two new benchmarks: HFTBench, a high frequency trading simulation, and StreetFighter, a competitive gaming platform. Our analysis reveals that optimal latency quality balance varies by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haokang-timmy/latencysensitivebench
noneOfficial

Videos

Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs· slideslive

Taxonomy

TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis

MethodsAttentive Walk-Aggregating Graph Neural Network