Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
Nikhil Kandpal, Brian Lester, Mohammed Muqeeth, Anisha Mascarenhas,, Monty Evans, Vishal Baskaran, Tenghao Huang, Haokun Liu, Colin Raffel

TL;DR
Git-Theta is a novel extension to Git that enables collaborative, fine-grained version control of machine learning models, supporting efficient updates, automatic merges, and meaningful change reports to facilitate shared model development.
Contribution
It introduces Git-Theta, the first version control system tailored for machine learning models, allowing structured tracking and collaborative development of models.
Findings
Supports communication-efficient model updates
Enables automatic merging of model versions
Provides meaningful diff reports for models
Abstract
Currently, most machine learning models are trained by centralized teams and are rarely updated. In contrast, open-source software development involves the iterative development of a shared artifact through distributed collaboration using a version control system. In the interest of enabling collaborative and continual improvement of machine learning models, we introduce Git-Theta, a version control system for machine learning models. Git-Theta is an extension to Git, the most widely used version control software, that allows fine-grained tracking of changes to model parameters alongside code and other artifacts. Unlike existing version control systems that treat a model checkpoint as a blob of data, Git-Theta leverages the structure of checkpoints to support communication-efficient updates, automatic model merges, and meaningful reporting about the difference between two versions of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Scientific Computing and Data Management
