Can Agentic AI Match the Performance of Human Data Scientists?
An Luo, Jin Du, Fangqiao Tian, Xun Xian, Robert Specht, Ganghua Wang, Xuan Bi, Charles Fleming, Jayanth Srinivasa, Ashish Kundu, Mingyi Hong, Jie Ding

TL;DR
This paper investigates whether agentic AI systems can match human data scientists in complex prediction tasks, revealing current limitations due to lack of domain-specific knowledge integration.
Contribution
The study demonstrates that current agentic AI systems relying on generic workflows underperform compared to humans in tasks requiring domain insight, highlighting a key area for future improvement.
Findings
Agentic AI struggles with tasks requiring hidden domain-specific variables.
Humans outperform AI by leveraging domain knowledge.
Current AI workflows lack effective domain knowledge integration.
Abstract
Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental question persists: Can these agentic AI systems truly match the performance of human data scientists who routinely leverage domain-specific knowledge? We explore this question by designing a prediction task where a crucial latent variable is hidden in relevant image data instead of tabular features. As a result, agentic AI that generates generic codes for modeling tabular data cannot perform well, while human experts could identify the important hidden variable using domain knowledge. We demonstrate this idea with a synthetic dataset for property insurance. Our experiments show that agentic AI that relies on generic analytics workflow falls short of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling
