TL;DR
Shipwright is a human-in-the-loop system that leverages machine learning and rule-based repairs to identify and fix broken Dockerfiles, significantly improving build success rates and automating repairs in real-world projects.
Contribution
The paper introduces Shipwright, a novel system combining BERT-based clustering and rule-based repairs for Dockerfiles, with successful real-world application and comparison to static analysis tools.
Findings
Shipwright achieved a 42.2% acceptance rate for pull requests.
It proposed repairs equivalent to human patches in 22.77% of cases.
Detected issues in 73.25% of files, outperforming static analysis tools.
Abstract
Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and -- to our great surprise -- found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop system for finding repairs to broken Dockerfiles. SHIPWRIGHT uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs to Dockerfiles. With the aid of SHIPWRIGHT, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a "time-travel" analysis of broken Dockerfiles that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
