AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

Jacopo Tagliabue

arXiv:2510.18897·cs.DC·October 23, 2025

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators

Jacopo Tagliabue

PDF

Open Access 1 Video

TL;DR

This paper presents a method combining large language models and domain-specific simulators to iteratively design and verify distributed systems policies, aiming to improve scalability and performance.

Contribution

It introduces a generate-and-verify framework using LLMs and simulators for scalable distributed system policy design, with preliminary throughput improvements demonstrated.

Findings

01

Preliminary throughput gains across multiple models

02

Framework preserves interpretability and targeted search

03

Discussion on scaling and future directions

Abstract

We explore AI-driven distributed-systems policy design by combining stochastic code generation from large language models (LLMs) with deterministic verification in a domain-specific simulator. Using a Function-as-a-Service runtime (Bauplan) and its open-source simulator (Eudoxia) as a case study, we frame scheduler design as an iterative generate-and-verify loop: an LLM proposes a Python policy, the simulator evaluates it on standardized traces, and structured feedback steers subsequent generations. This setup preserves interpretability while enabling targeted search over a large design space. We detail the system architecture and report preliminary results on throughput improvements across multiple models. Beyond early gains, we discuss the limits of the current setup and outline next steps; in particular, we conjecture that AI will be crucial for scaling this methodology by helping to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators· underline

Taxonomy

TopicsScientific Computing and Data Management · Advanced Software Engineering Methodologies · Machine Learning in Materials Science