ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management

Zaifeng Pan; Yipeng Shen; Zhengding Hu; Zhuang Wang; Aninda Manocha; Zheng Wang; Zhongkai Yu; Yue Guan; Yufei Ding

arXiv:2601.21473·cs.AI·January 30, 2026

ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management

Zaifeng Pan, Yipeng Shen, Zhengding Hu, Zhuang Wang, Aninda Manocha, Zheng Wang, Zhongkai Yu, Yue Guan, Yufei Ding

PDF

Open Access

TL;DR

ScaleSim is a memory-efficient system that improves large-scale multi-agent LLM simulations by using invocation distance to optimize memory management, enabling faster performance and better scalability.

Contribution

We introduce invocation distance, a novel abstraction, and develop ScaleSim, a system that enhances memory management and performance in large-scale multi-agent LLM simulations.

Findings

01

Achieves up to 1.74x speedup over SGLang.

02

Effectively manages GPU memory for large agent populations.

03

Supports diverse agent-specific memory with modular design.

Abstract

LLM-based multi-agent simulations are increasingly adopted across application domains, but remain difficult to scale due to GPU memory pressure. Each agent maintains private GPU-resident states, including models, prefix caches, and adapters, which quickly exhaust device memory as the agent count grows. We identify two key properties of these workloads: sparse agent activation and an estimable agent invocation order. Based on an analysis of representative workload classes, we introduce invocation distance, a unified abstraction that estimates the relative order in which agents will issue future LLM requests. Leveraging this abstraction, we present ScaleSim, a memory-efficient LLM serving system for large-scale multi-agent simulations. ScaleSim enables proactive prefetching and priority-based eviction, supports diverse agent-specific memory through a modular interface, and achieves up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Multi-Agent Systems and Negotiation · Parallel Computing and Optimization Techniques