Source Code Properties of Defective Infrastructure as Code Scripts
Akond Rahman, Laurie Williams

TL;DR
This study empirically investigates source code properties of Infrastructure as Code scripts, identifying key indicators like lines of code and hard-coded strings that correlate with defects, and develops prediction models to improve script quality.
Contribution
It identifies ten source code properties linked to defective IaC scripts and constructs defect prediction models using these properties, validated on multiple datasets.
Findings
Lines of code and hard-coded strings strongly correlate with defects.
Practitioners agree that include and hard-coded string are important properties.
Prediction models achieve 0.70-0.78 precision and 0.54-0.67 recall.
Abstract
Context: In continuous deployment, software and services are rapidly deployed to end-users using an automated deployment pipeline. Defects in infrastructure as code (IaC) scripts can hinder the reliability of the automated deployment pipeline. We hypothesize that certain properties of IaC source code such as lines of code and hard-coded strings used as configuration values, show correlation with defective IaC scripts. Objective: The objective of this paper is to help practitioners in increasing the quality of infrastructure as code (IaC) scripts through an empirical study that identifies source code properties of defective IaC scripts. Methodology: We apply qualitative analysis on defect-related commits mined from open source software repositories to identify source code properties that correlate with defective IaC scripts. Next, we survey practitioners to assess the practitioner's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
