VeML: An End-to-End Machine Learning Lifecycle for Large-scale and High-dimensional Data
Van-Duc Le, Tien-Cuong Bui, Wen-Syan Li

TL;DR
VeML is a comprehensive system that manages the entire machine learning lifecycle for large-scale, high-dimensional data, enabling efficient version control, similarity transfer, and mismatch detection to improve model reliability.
Contribution
This paper introduces VeML, a novel version management system that addresses high costs and data mismatch issues in end-to-end ML lifecycle management for large datasets.
Findings
Efficient similarity computation for large-scale, high-dimensional data.
Effective detection of training-testing data mismatch without labeled data.
Promising experimental results on real-world datasets.
Abstract
An end-to-end machine learning (ML) lifecycle consists of many iterative processes, from data preparation and ML model design to model training and then deploying the trained model for inference. When building an end-to-end lifecycle for an ML problem, many ML pipelines must be designed and executed that produce a huge number of lifecycle versions. Therefore, this paper introduces VeML, a Version management system dedicated to end-to-end ML Lifecycle. Our system tackles several crucial problems that other systems have not solved. First, we address the high cost of building an ML lifecycle, especially for large-scale and high-dimensional dataset. We solve this problem by proposing to transfer the lifecycle of similar datasets managed in our system to the new training data. We design an algorithm based on the core set to compute similarity for large-scale, high-dimensional data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)
