Revisiting Dockerfiles in Open Source Software Over Time
Kalvin Eng, Abram Hindle

TL;DR
This study analyzes over 9.4 million Dockerfiles from 2013-2020 to validate previous findings, revealing trends in image usage, Dockerfile quality, and the evolution of Dockerfile formats over time.
Contribution
It provides a comprehensive historical analysis of Dockerfiles using the largest dataset to date, confirming prior trends and enhancing understanding of Dockerfile evolution.
Findings
Decline in OS image usage over time
Increase in language image usage
Slight decrease in Dockerfile smell counts
Abstract
Docker is becoming ubiquitous with containerization for developing and deploying applications. Previous studies have analyzed Dockerfiles that are used to create container images in order to better understand how to improve Docker tooling. These studies obtain Dockerfiles using either Docker Hub or Github. In this paper, we revisit the findings of previous studies using the largest set of Dockerfiles known to date with over 9.4 million unique Dockerfiles found in the World of Code infrastructure spanning from 2013-2020. We contribute a historical view of the Dockerfile format by analyzing the Docker engine changelogs and use the history to enhance our analysis of Dockerfiles. We also reconfirm previous findings of a downward trend in using OS images and an upward trend of using language images. As well, we reconfirm that Dockerfile smell counts are slightly decreasing meaning that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
