AutoDC: Automated data-centric processing
Zac Yung-Chun Liu, Shoumik Roychowdhury, Scott Tarlow, Akash Nair,, Shweta Badhe, Tejas Shah

TL;DR
AutoDC is an automated tool designed to streamline data-centric tasks in machine learning, significantly reducing manual effort and improving model accuracy on image classification datasets.
Contribution
AutoDC introduces an automated approach to dataset improvement, addressing the artisanal nature of data fixing and augmentation in data-centric machine learning.
Findings
Reduces manual data improvement time by approximately 80%.
Improves model accuracy by 10-15%.
Effective on open source image classification datasets.
Abstract
AutoML (automated machine learning) has been extensively developed in the past few years for the model-centric approach. As for the data-centric approach, the processes to improve the dataset, such as fixing incorrect labels, adding examples that represent edge cases, and applying data augmentation, are still very artisanal and expensive. Here we develop an automated data-centric tool (AutoDC), similar to the purpose of AutoML, aims to speed up the dataset improvement processes. In our preliminary tests on 3 open source image classification datasets, AutoDC is estimated to reduce roughly 80% of the manual time for data improvement tasks, at the same time, improve the model accuracy by 10-15% with the fixed ML code.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
