Large Language Models as Realistic Microservice Trace Generators
Donghyun Kim, Sriram Ravula, Taemin Ha, Alexandros G. Dimakis, Daehyeok Kim, Aditya Akella

TL;DR
This paper introduces TraceLLM, a large language model trained to generate realistic synthetic microservice call graph traces, improving over existing methods by capturing complex structures and enabling various trace-related tasks.
Contribution
It presents a novel LLM-based approach for synthetic microservice trace generation, incorporating recursive training and instruction tuning for enhanced realism and applicability.
Findings
TraceLLM produces diverse, realistic traces outperforming existing methods.
Generated traces effectively replace real data in microservice management tasks.
TraceLLM adapts to downstream tasks like feature prediction and data infilling.
Abstract
Workload traces are essential to understand complex computer systems' behavior and manage processing and memory resources. Since real-world traces are hard to obtain, synthetic trace generation is a promising alternative. This paper proposes a first-of-a-kind approach that relies on training a large language model (LLM) to generate synthetic workload traces, specifically microservice call graphs. To capture complex and arbitrary hierarchical structures and implicit constraints in such traces, we propose to train LLMs to generate recursively, making call graph generation a sequence of more manageable steps. To further enforce learning constraints on the traces and generate uncommon situations, we apply additional instruction tuning steps to align our model with the desired trace features. With this method, we train TraceLLM, an LLM for microservice trace generation, and demonstrate that…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The paper is well-written. 2. The authors solve an important problem of generating data for micros-service calls which might be difficult to acquire because of privacy and performance constraints. 3. This is an interesting domain of generating structured data.
1. The model evaluation is done for just one data, so the results might not be that generalizable to different datasets of varied characterisitics. 2. The model shows results for LLaMa-2-7B, but it does not evaluate for larger models. Will the proposed approach be as useful as the model scales.? 3. The model might overfit to certain structures if the training data does not capture diversity.
The paper introduces an approach to generating synthetic microservice call graphs using large language models (LLMs), which is an interesting idea for system workload tracing. It attempts to handle complex and arbitrary hierarchical structures and implicit constraints within microservice call graphs through a recursive generation method. The paper highlights the potential of synthetic traces to replace real-world data in optimizing and tuning system management tasks, offering significant advant
1. I have serious doubts about the motivation of this paper: Do we really need synthetic traces in the microservices trace domain? 1)None of the three "synthetic trace generation" methods mentioned by the authors are for generating microservices traces: (Bergsma et al., 2021) is for generating cloud workloads, and (Jiang et al., 2023; Yin et al., 2022) are for producing network traces. 2)The authors only made a brief statement about the motivation, "Obtaining real-world traces is often h
+ The data synthesized by the proposed generator based on fine-tuned LLM with the use of recursive call graph generation is able to match the performance of real data traces when used to train ML models. + The LLM is able to incorporate specifications from users. + Ablation study to evaluate the impact of recursive generation and intermediate instructions
- One of the motivations for using synthetic data is to preserve privacy, but how much LLMs can or can not leak private information is still an open research question. - Since the paper aims to generate (call) graphs, it would have been nice to include as a baseline a generator for attributed graphs. - While authors use three baselines (TVAE and GreaT and a probabilistic model) when studying and comparing the distribution similarity between real and synthetic traces, the results on downstream ut
Code & Models
Videos
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Software Engineering Research
MethodsALIGN
