AgenticData: An Agentic Data Analytics System for Heterogeneous Data
Ji Sun, Guoliang Li, Peiyao Zhou, Yihui Ma, Jingzhe Xu, Yuan Li

TL;DR
AgenticData is an autonomous data analytics system that converts natural language questions into semantic plans, enabling efficient analysis of heterogeneous data sources without expert coding, and it outperforms existing methods.
Contribution
It introduces a novel multi-agent framework with semantic optimization for natural language-driven data analysis across diverse data types.
Findings
Achieved superior accuracy on three benchmark datasets.
Outperformed state-of-the-art methods in data analysis tasks.
Effectively handles both unstructured and structured data.
Abstract
Existing unstructured data analytics systems rely on experts to write code and manage complex analysis workflows, making them both expensive and time-consuming. To address these challenges, we introduce AgenticData, an innovative agentic data analytics system that allows users to simply pose natural language (NL) questions while autonomously analyzing data sources across multiple domains, including both unstructured and structured data. First, AgenticData employs a feedback-driven planning technique that automatically converts an NL query into a semantic plan composed of relational and semantic operators. We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context and long-term knowledge.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
