GenSIaC: Toward Security-Aware Infrastructure-as-Code Generation with Large Language Models
Yikun Li, Matteo Grella, Daniel Nahmias, Gal Engelberg, Dan Klein, Giancarlo Guizzardi, Thijs van Ede, Andrea Continella

TL;DR
This paper explores how Large Language Models can be fine-tuned with a specialized dataset to generate security-aware Infrastructure as Code, significantly reducing security vulnerabilities and misconfigurations.
Contribution
The paper introduces GenSIaC, a fine-tuning dataset that enhances LLMs' ability to recognize and generate secure IaC scripts, addressing security weaknesses in current models.
Findings
F1-score for security recognition improved from 0.303 to 0.858
Models effectively identify major IaC security weaknesses
GenSIaC demonstrates good generalizability across LLMs and languages
Abstract
In recent years, Infrastructure as Code (IaC) has emerged as a critical approach for managing and provisioning IT infrastructure through code and automation. IaC enables organizations to create scalable and consistent environments, effectively managing servers and development settings. However, the growing complexity of cloud infrastructures has led to an increased risk of misconfigurations and security vulnerabilities in IaC scripts. To address this problem, this paper investigates the potential of Large Language Models (LLMs) in generating security-aware IaC code, avoiding misconfigurations introduced by developers and administrators. While LLMs have made significant progress in natural language processing and code generation, their ability to generate secure IaC scripts remains unclear. This paper addresses two major problems: 1) the lack of understanding of security weaknesses in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Information and Cyber Security · Security and Verification in Computing
