Data-centric AI: Perspectives and Challenges
Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan Yang, Xia Hu

TL;DR
This paper discusses the emerging importance of data-centric AI, emphasizing the need for high-quality data in AI development, and outlines key missions, challenges, and perspectives for advancing this paradigm shift.
Contribution
It provides a comprehensive overview of data-centric AI, integrating various initiatives and highlighting open challenges to foster community efforts.
Findings
Identifies three main missions: training data development, inference data development, data maintenance.
Highlights the importance of data quality and reliability in AI systems.
Lists open challenges to guide future research in data-centric AI.
Abstract
The role of data in building AI systems has recently been significantly magnified by the emerging concept of data-centric AI (DCAI), which advocates a fundamental shift from model advancements to ensuring data quality and reliability. Although our community has continuously invested efforts into enhancing data in different aspects, they are often isolated initiatives on specific tasks. To facilitate the collective initiative in our community and push forward DCAI, we draw a big picture and bring together three general missions: training data development, inference data development, and data maintenance. We provide a top-level discussion on representative DCAI tasks and share perspectives. Finally, we list open challenges. More resources are summarized at https://github.com/daochenzha/data-centric-AI
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Time Series Analysis and Forecasting · Machine Learning and Data Classification
