Data-Centric AI Requires Rethinking Data Notion
Mustafa Hajij, Ghada Zamzmi, Karthikeyan Natesan Ramamurthy, Aldo, Guzman Saenz

TL;DR
This paper advocates for a fundamental rethinking of data concepts in AI, proposing categorical and cochain frameworks to unify data understanding and improve machine learning package development.
Contribution
It introduces a unifying theoretical framework based on categorical and cochain notions of data to advance data-centric AI practices.
Findings
Proposes categorical and cochain principles as unifying data notions.
Highlights impact on machine learning package development.
Discusses importance for data-centric AI transition.
Abstract
The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, data is viewed as a mathematical structure that we act upon via morphisms to preserve this structure. As for cochain notion, data can be viewed as a function defined in a discrete domain of interest and acted upon via operators. While these notions are almost orthogonal, they provide a unifying definition to view data, ultimately impacting the way machine learning packages are developed, implemented, and utilized by practitioners.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
