Shipwright: A Human-in-the-Loop System for Dockerfile Repair

Jordan Henkel; Denini Silva; Leopoldo Teixeira; Marcelo d'Amorim,; Thomas Reps

arXiv:2103.02591·cs.SE·March 4, 2021

Shipwright: A Human-in-the-Loop System for Dockerfile Repair

Jordan Henkel, Denini Silva, Leopoldo Teixeira, Marcelo d'Amorim,, Thomas Reps

PDF

1 Repo

TL;DR

Shipwright is a human-in-the-loop system that leverages machine learning and rule-based repairs to identify and fix broken Dockerfiles, significantly improving build success rates and automating repairs in real-world projects.

Contribution

The paper introduces Shipwright, a novel system combining BERT-based clustering and rule-based repairs for Dockerfiles, with successful real-world application and comparison to static analysis tools.

Findings

01

Shipwright achieved a 42.2% acceptance rate for pull requests.

02

It proposed repairs equivalent to human patches in 22.77% of cases.

03

Detected issues in 73.25% of files, outperforming static analysis tools.

Abstract

Docker is a tool for lightweight OS-level virtualization. Docker images are created by performing a build, controlled by a source-level artifact called a Dockerfile. We studied Dockerfiles on GitHub, and -- to our great surprise -- found that over a quarter of the examined Dockerfiles failed to build (and thus to produce images). To address this problem, we propose SHIPWRIGHT, a human-in-the-loop system for finding repairs to broken Dockerfiles. SHIPWRIGHT uses a modified version of the BERT language model to embed build logs and to cluster broken Dockerfiles. Using these clusters and a search-based procedure, we were able to design 13 rules for making automated repairs to Dockerfiles. With the aid of SHIPWRIGHT, we submitted 45 pull requests (with a 42.2% acceptance rate) to GitHub projects with broken Dockerfiles. Furthermore, in a "time-travel" analysis of broken Dockerfiles that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

STAR-RG/shipwright
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.