Accuracy Is Speed: Towards Long-Context-Aware Routing for Distributed LLM Serving

Takeshi Yoshimura; Valentijn Dymphnus van de Beek; Tatsuhiro Chiba

arXiv:2604.15732·cs.DC·April 20, 2026

Accuracy Is Speed: Towards Long-Context-Aware Routing for Distributed LLM Serving

Takeshi Yoshimura, Valentijn Dymphnus van de Beek, Tatsuhiro Chiba

PDF

TL;DR

This paper introduces a new metric, TTCA, to measure the time until a correct response in long-context distributed LLM serving, emphasizing accuracy's role in speed.

Contribution

It proposes Lightweight Accuracy-Aware Routing (LAAR), a novel routing method that reduces TTCA by considering accuracy as a key system objective.

Findings

01

Prompt length and language increase accuracy variance and TTCA.

02

LAAR reduces TTCA in long-context distributed LLM serving.

03

Accuracy-aware routing improves overall response reliability.

Abstract

Distributed LLM serving systems optimize per-request latency and throughput. However, under long-context workloads, inference accuracy becomes more variable. When incorrect responses trigger retries, accuracy directly translates into cumulative user-visible delay that is not captured by single-shot latency metrics. In this work, we argue that under long-context serving, \textbf{accuracy becomes speed} through retry dynamics. We introduce \textit{Time-to-Correct-Answer (TTCA)}, a metric that measures the wall-clock time required to obtain the first correct response. Our measurement study shows that prompt characteristics such as length and language amplify accuracy variance, which inflates TTCA. We demonstrate \textit{Lightweight Accuracy-Aware Routing (LAAR)}, a capability-based routing design that reduces TTCA. Our results suggest that in long-context distributed serving, accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.