DataCI: A Platform for Data-Centric AI on Streaming Data
Huaizheng Zhang, Yizheng Huang, Yuanming Li

TL;DR
DataCI is an open-source platform that facilitates data-centric AI development on streaming data through infrastructure, versioning, and a user-friendly interface, aiming to enhance streaming data management and analysis.
Contribution
It introduces a comprehensive platform with APIs, version control, and graphical interface tailored for data-centric AI in streaming data environments.
Findings
Demonstrates ease of use and effectiveness of DataCI
Preliminary studies show potential for revolutionizing streaming data AI
Highlights the platform's capabilities in managing and analyzing streaming data
Abstract
We introduce DataCI, a comprehensive open-source platform designed specifically for data-centric AI in dynamic streaming data settings. DataCI provides 1) an infrastructure with rich APIs for seamless streaming dataset management, data-centric pipeline development and evaluation on streaming scenarios, 2) an carefully designed versioning control function to track the pipeline lineage, and 3) an intuitive graphical interface for a better interactive user experience. Preliminary studies and demonstrations attest to the easy-to-use and effectiveness of DataCI, highlighting its potential to revolutionize the practice of data-centric AI in streaming data contexts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Advanced Database Systems and Queries · Time Series Analysis and Forecasting
