MGit: A Model Versioning and Management System
Wei Hao, Daniel Mendoza, Rafael da Silva, Deepak Narayanan, and Amar Phanishaye

TL;DR
MGit is a system designed to efficiently manage, version, and track the lineage of machine learning models, reducing storage costs and improving collaboration and update processes.
Contribution
It introduces a lineage graph and storage optimizations for model management, enabling efficient versioning, provenance tracking, and automatic updates.
Findings
Reduces lineage graph storage by up to 7x
Enables automatic downstream model updates
Facilitates better collaboration and testing
Abstract
Models derived from other models are extremely common in machine learning (ML) today. For example, transfer learning is used to create task-specific models from "pre-trained" models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values. However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?). In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software System Performance and Reliability · Data Quality and Management
