AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science

An Luo; Xun Xian; Jin Du; Fangqiao Tian; Ganghua Wang; Ming Zhong; Shengchun Zhao; Xuan Bi; Zirui Liu; Jiawei Zhou; Jayanth Srinivasa; Ashish Kundu; Charles Fleming; Mingyi Hong; Jie Ding

arXiv:2506.13992·cs.LG·October 24, 2025

AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science

An Luo, Xun Xian, Jin Du, Fangqiao Tian, Ganghua Wang, Ming Zhong, Shengchun Zhao, Xuan Bi, Zirui Liu, Jiawei Zhou, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces AssistedDS, a benchmark to evaluate how well large language models can utilize external domain knowledge in automated data science tasks, revealing significant limitations in their critical evaluation abilities.

Contribution

The paper presents AssistedDS, a novel benchmark with synthetic and real datasets to systematically assess LLMs' handling of domain knowledge in data science workflows.

Findings

01

LLMs often uncritically adopt provided information, harming performance with adversarial content.

02

Helpful guidance alone cannot prevent negative effects of adversarial data.

03

LLMs struggle with time-series data, feature engineering, and categorical variables in Kaggle datasets.

Abstract

Large language models (LLMs) have advanced the automation of data science workflows. Yet it remains unclear whether they can critically leverage external domain knowledge as human data scientists do in practice. To answer this question, we introduce AssistedDS (Assisted Data Science), a benchmark designed to systematically evaluate how LLMs handle domain knowledge in tabular prediction tasks. AssistedDS features both synthetic datasets with explicitly known generative mechanisms and real-world Kaggle competitions, each accompanied by curated bundles of helpful and adversarial documents. These documents provide domain-specific insights into data cleaning, feature engineering, and model selection. We assess state-of-the-art LLMs on their ability to discern and apply beneficial versus harmful domain knowledge, evaluating submission validity, information recall, and predictive performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

lainmn/AssistedDS-Synthetic
dataset· 16 dl
16 dl

Videos

AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science· underline

Taxonomy

TopicsData Mining Algorithms and Applications · Data Quality and Management · Machine Learning and Data Classification