Empirical Study on the Software Engineering Practices in Open Source ML Package Repositories
Minke Xiu, Ellis E. Eghan, Zhen Ming (Jack) Jiang, Bram Adams

TL;DR
This paper provides an empirical analysis of the structure, practices, and challenges of open source ML package repositories, comparing them with traditional software repositories to inform better sharing and reuse practices.
Contribution
It offers the first empirical comparison of ML package repositories like TFHub and PyTorch Hub with traditional repositories, highlighting unique practices and challenges.
Findings
ML repositories have distinct organizational structures.
Sharing practices differ from traditional software repositories.
Identified challenges in ML package sharing and reuse.
Abstract
Recent advances in Artificial Intelligence (AI), especially in Machine Learning (ML), have introduced various practical applications (e.g., virtual personal assistants and autonomous cars) that enhance the experience of everyday users. However, modern ML technologies like Deep Learning require considerable technical expertise and resources to develop, train and deploy such models, making effective reuse of the ML models a necessity. Such discovery and reuse by practitioners and researchers are being addressed by public ML package repositories, which bundle up pre-trained models into packages for publication. Since such repositories are a recent phenomenon, there is no empirical data on their current state and challenges. Hence, this paper conducts an exploratory study that analyzes the structure and contents of two popular ML package repositories, TFHub and PyTorch Hub, comparing their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Engineering Research · Cloud Computing and Resource Management
