DA-Code: Agent Data Science Code Generation Benchmark for Large Language   Models

Yiming Huang; Jianwen Luo; Yan Yu; Yitong Zhang; Fangyu Lei; Yifan; Wei; Shizhu He; Lifu Huang; Xiao Liu; Jun Zhao; Kang Liu

arXiv:2410.07331·cs.CL·October 14, 2024

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Yiming Huang, Jianwen Luo, Yan Yu, Yitong Zhang, Fangyu Lei, Yifan, Wei, Shizhu He, Lifu Huang, Xiao Liu, Jun Zhao, Kang Liu

PDF

Open Access

TL;DR

DA-Code is a challenging new benchmark for evaluating large language models on complex, real-world data science tasks that require advanced coding, grounding, and planning skills.

Contribution

The paper introduces DA-Code, a novel benchmark for agent-based data science code generation, with diverse real data tasks and a new baseline model.

Findings

01

Current LLMs achieve only 30.5% accuracy on DA-Code

02

DA-Code covers complex data wrangling and analytics tasks

03

Benchmark is scalable and aligned with real-world scenarios

Abstract

We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation

MethodsSparse Evolutionary Training