Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Joon Ha Kim; Geon-Woo Kim; Anoop Rachakonda; Daehyeok Kim

arXiv:2605.07985·cs.DC·May 22, 2026

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

Joon Ha Kim, Geon-Woo Kim, Anoop Rachakonda, Daehyeok Kim

PDF

1 Repo

TL;DR

Dooly is a profiling tool for large language model inference that reduces costs and improves flexibility by leveraging structural input information to perform configuration-agnostic, efficient latency estimation.

Contribution

Dooly introduces a novel structure-aware profiling method that enables single-pass, configuration-agnostic latency estimation for diverse LLM inference workloads.

Findings

01

Achieves within 5% MAPE accuracy for TTFT and 8% for TPOT.

02

Reduces profiling GPU-hours by 56.4% across 12 models.

03

Works across multiple GPU platforms and attention backends.

Abstract

Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration prohibitively expensive. This cost stems from a missing structural understanding: every input dimension of each operation is fixed by the model configuration or determined by the incoming request. Many model-configuration values (e.g., head size, layer count) recur across models, so the same operation runs in many configurations; a single sweep over the request-dependent dimensions can serve them all. We present Dooly, which exploits this structure to achieve configuration-agnostic, redundancy-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dooly-project
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.