ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management
Zaifeng Pan, Yipeng Shen, Zhengding Hu, Zhuang Wang, Aninda Manocha, Zheng Wang, Zhongkai Yu, Yue Guan, Yufei Ding

TL;DR
ScaleSim is a memory-efficient system that improves large-scale multi-agent LLM simulations by using invocation distance to optimize memory management, enabling faster performance and better scalability.
Contribution
We introduce invocation distance, a novel abstraction, and develop ScaleSim, a system that enhances memory management and performance in large-scale multi-agent LLM simulations.
Findings
Achieves up to 1.74x speedup over SGLang.
Effectively manages GPU memory for large agent populations.
Supports diverse agent-specific memory with modular design.
Abstract
LLM-based multi-agent simulations are increasingly adopted across application domains, but remain difficult to scale due to GPU memory pressure. Each agent maintains private GPU-resident states, including models, prefix caches, and adapters, which quickly exhaust device memory as the agent count grows. We identify two key properties of these workloads: sparse agent activation and an estimable agent invocation order. Based on an analysis of representative workload classes, we introduce invocation distance, a unified abstraction that estimates the relative order in which agents will issue future LLM requests. Leveraging this abstraction, we present ScaleSim, a memory-efficient LLM serving system for large-scale multi-agent simulations. ScaleSim enables proactive prefetching and priority-based eviction, supports diverse agent-specific memory through a modular interface, and achieves up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Multi-Agent Systems and Negotiation · Parallel Computing and Optimization Techniques
