Deployability-Centric Infrastructure-as-Code Generation: Fail, Learn, Refine, and Succeed through LLM-Empowered DevOps Simulation
Tianyi Zhang, Shidong Pan, Zejun Zhang, Zhenchang Xing, Xiaoyu Sun

TL;DR
This paper introduces DPIaC-Eval, a deployability-focused benchmark for IaC templates, and proposes IaCGen, an LLM-based iterative framework that significantly improves deployability success rates in cloud infrastructure automation.
Contribution
It presents the first deployability-centric IaC benchmark and an LLM-driven framework with iterative feedback to enhance deployability of generated infrastructure templates.
Findings
LLMs perform poorly on deployability, with only 20.8-30.2% success rate initially.
IaCGen improves deployability to over 90% with iterative feedback.
Human feedback further boosts success rates and highlights trustworthiness issues.
Abstract
Infrastructure-as-Code (IaC) generation holds significant promise for automating cloud infrastructure provisioning. Recent advances in Large Language Models (LLMs) present a promising opportunity to democratize IaC development by generating deployable infrastructure templates from natural language descriptions. However, current evaluation focuses on syntactic correctness while ignoring deployability, the critical measure of the utility of IaC configuration files. Six state-of-the-art LLMs performed poorly on deployability, achieving only 20.830.2% deployment success rate on the first attempt. In this paper, we construct DPIaC-Eval, the first deployability-centric IaC template benchmark consisting of 153 real-world scenarios cross 58 unique services. Also, we propose an LLM-based deployability-centric framework, dubbed IaCGen, that uses iterative feedback mechanism encompassing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Software System Performance and Reliability · IoT and Edge/Fog Computing
