Data Agents: Levels, State of the Art, and Open Problems

Yuyu Luo; Guoliang Li; Ju Fan; Nan Tang

arXiv:2602.04261·cs.DB·February 5, 2026

Data Agents: Levels, State of the Art, and Open Problems

Yuyu Luo, Guoliang Li, Ju Fan, Nan Tang

PDF

Open Access

TL;DR

This paper introduces a hierarchical taxonomy of data agents from no autonomy to full autonomy, reviews current systems, and discusses future research challenges for autonomous data management and analysis.

Contribution

It proposes the first comprehensive taxonomy of data agents, clarifies capability boundaries, and provides a research roadmap for advancing autonomous data systems.

Findings

01

Present the L0-L5 taxonomy of data agents.

02

Review current L0-L2 data management systems.

03

Identify challenges for L4 and L5 autonomous data agents.

Abstract

Data agents are an emerging paradigm that leverages large language models (LLMs) and tool-using agents to automate data management, preparation, and analysis tasks. However, the term "data agent" is currently used inconsistently, conflating simple query responsive assistants with aspirational fully autonomous "data scientists". This ambiguity blurs capability boundaries and accountability, making it difficult for users, system builders, and regulators to reason about what a "data agent" can and cannot do. In this tutorial, we propose the first hierarchical taxonomy of data agents from Level 0 (L0, no autonomy) to Level 5 (L5, full autonomy). Building on this taxonomy, we will introduce a lifecycleand level-driven view of data agents. We will (1) present the L0-L5 taxonomy and the key evolutionary leaps that separate simple assistants from truly autonomous data agents, (2) review…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Multimodal Machine Learning Applications