Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving
Amey Agrawal, Mayank Yadav, Sukrit Kumar, Anirudha Agrawal, Garv Ghai, Souradeep Bera, Elton Pinto, Sirish Gambhira, Mohammad Adain, Kasra Sohrab, Chus Antonanzas, Alexey Tumanov

TL;DR
Revati is a GPU-free time-warp emulator that enables fast, accurate performance modeling of large language model serving systems by virtualizing GPU execution and allowing rapid simulation of different configurations.
Contribution
Revati introduces a novel time-warp emulation approach that directly executes real serving system code, virtualizes GPU management, and performs fast-forwarded simulation without requiring physical GPUs.
Findings
Achieves less than 5% prediction error across models and configurations
Runs 5-17x faster than real GPU execution
Reduces testing costs and time for LLM deployment
Abstract
Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators are faster and cheaper, but they require re-implementing the serving system's control logic -- a burden that compounds as frameworks evolve. We present Revati, a time-warp emulator that enables performance modeling by directly executing real serving system code at simulation-like speed. The system intercepts CUDA API calls to virtualize device management, allowing serving frameworks to run without physical GPUs. Instead of executing GPU kernels, it performs time jumps -- fast-forwarding virtual time by predicted kernel durations. We propose a coordination protocol that synchronizes these jumps across distributed processes while preserving causality. On vLLM and SGLang, Revati achieves less than 5%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Software System Performance and Reliability
