How do Machine Learning Models Change?
Joel Casta\~no, Rafael Caba\~nas, Antonio Salmer\'on, David Lo, Silverio Mart\'inez-Fern\'andez

TL;DR
This study analyzes over 680,000 commits from 100,000 models on Hugging Face to understand how machine learning models evolve over time, revealing patterns aligned with data science methodologies and differences based on project popularity and collaboration.
Contribution
It provides the first large-scale longitudinal analysis of ML model evolution on a community platform, applying an extended change taxonomy and Bayesian modeling to uncover development patterns.
Findings
Commit activities follow data science methodologies like CRISP-DM.
Release patterns consolidate significant updates, especially in outputs and documentation.
Popular projects tend to start from more mature baselines with fewer foundational commits.
Abstract
The proliferation of Machine Learning (ML) models and their open-source implementations has transformed Artificial Intelligence research and applications. Platforms like Hugging Face (HF) enable this evolving ecosystem, yet a large-scale longitudinal study of how these models change is lacking. This study addresses this gap by analyzing over 680,000 commits from 100,000 models and 2,251 releases from 202 of these models on HF using repository mining and longitudinal methods. We apply an extended ML change taxonomy to classify commits and use Bayesian networks to model temporal patterns in commit and release activities. Our findings show that commit activities align with established data science methodologies, such as the Cross-Industry Standard Process for Data Mining (CRISP-DM), emphasizing iterative refinement. Release patterns tend to consolidate significant updates, particularly in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Ethics and Social Impacts of AI · Software Engineering Research
MethodsALIGN
