Can Agentic AI Match the Performance of Human Data Scientists?

An Luo; Jin Du; Fangqiao Tian; Xun Xian; Robert Specht; Ganghua Wang; Xuan Bi; Charles Fleming; Jayanth Srinivasa; Ashish Kundu; Mingyi Hong; Jie Ding

arXiv:2512.20959·cs.LG·December 25, 2025

Can Agentic AI Match the Performance of Human Data Scientists?

An Luo, Jin Du, Fangqiao Tian, Xun Xian, Robert Specht, Ganghua Wang, Xuan Bi, Charles Fleming, Jayanth Srinivasa, Ashish Kundu, Mingyi Hong, Jie Ding

PDF

Open Access 5 Datasets

TL;DR

This paper investigates whether agentic AI systems can match human data scientists in complex prediction tasks, revealing current limitations due to lack of domain-specific knowledge integration.

Contribution

The study demonstrates that current agentic AI systems relying on generic workflows underperform compared to humans in tasks requiring domain insight, highlighting a key area for future improvement.

Findings

01

Agentic AI struggles with tasks requiring hidden domain-specific variables.

02

Humans outperform AI by leveraging domain knowledge.

03

Current AI workflows lack effective domain knowledge integration.

Abstract

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) have significantly automated data science workflows, but a fundamental question persists: Can these agentic AI systems truly match the performance of human data scientists who routinely leverage domain-specific knowledge? We explore this question by designing a prediction task where a crucial latent variable is hidden in relevant image data instead of tabular features. As a result, agentic AI that generates generic codes for modeling tabular data cannot perform well, while human experts could identify the important hidden variable using domain knowledge. We demonstrate this idea with a synthetic dataset for property insurance. Our experiments show that agentic AI that relies on generic analytics workflow falls short of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling