Refactoring for Dockerfile Quality: A Dive into Developer Practices and Automation Potential
Emna Ksontini, Meriem Mastouri, Rania Khalsi, Wael Kessentini

TL;DR
This paper investigates automating Dockerfile refactoring using AI techniques, demonstrating significant reductions in image size and build time, and improved maintainability, thereby proposing a practical approach for continuous Dockerfile quality enhancement.
Contribution
It introduces an AI-based automated refactoring method for Dockerfiles, showing substantial improvements over manual and existing tools, and discusses its integration into CI/CD pipelines.
Findings
Average 32% reduction in image size
6% decrease in build duration
Improved understandability and maintainability in most cases
Abstract
Docker, the industry standard for packaging and deploying applications, leverages Infrastructure as Code (IaC) principles to facilitate the creation of images through Dockerfiles. However, maintaining Dockerfiles presents significant challenges. Refactoring, in particular, is often a manual and complex process. This paper explores the utility and practicality of automating Dockerfile refactoring using 600 Dockerfiles from 358 open-source projects. Our study reveals that Dockerfile image size and build duration tend to increase as projects evolve, with developers often postponing refactoring efforts until later stages in the development cycle. This trend motivates the automation of refactoring. To achieve this, we leverage In Context Learning (ICL) along with a score-based demonstration selection strategy. Our approach leads to an average reduction of 32% in image size and a 6% decrease…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Scientific Computing and Data Management
