Can AI autonomously build, operate, and use the entire data stack?
Arvind Agarwal, Lisa Amini, Sameep Mehta, Horst Samulowitz, Kavitha Srinivas

TL;DR
This paper advocates for a paradigm shift towards fully autonomous AI-managed data estates, emphasizing holistic management of the entire data lifecycle to enhance efficiency and reduce human intervention.
Contribution
It proposes a comprehensive framework for autonomous management of the entire data stack using intelligent agents, moving beyond isolated AI applications in data components.
Findings
AI can potentially automate all stages of the data lifecycle.
Holistic autonomous data management can improve efficiency and reduce manual effort.
Open research questions highlight the need for further development in autonomous data systems.
Abstract
Enterprise data management is a monumental task. It spans data architecture and systems, integration, quality, governance, and continuous improvement. While AI assistants can help specific persona, such as data engineers and stewards, to navigate and configure the data stack, they fall far short of full automation. However, as AI becomes increasingly capable of tackling tasks that have previously resisted automation due to inherent complexities, we believe there is an imminent opportunity to target fully autonomous data estates. Currently, AI is used in different parts of the data stack, but in this paper, we argue for a paradigm shift from the use of AI in independent data component operations towards a more holistic and autonomous handling of the entire data lifecycle. Towards that end, we explore how each stage of the modern data stack can be autonomously managed by intelligent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Personal Information Management and User Behavior
