AIDABench: AI Data Analytics Benchmark

Yibo Yang; Fei Lei; Yixuan Sun; Yantao Zeng; Chengguang Lv; Jiancao Hong; Jiaojiao Tian; Tianyu Qiu; Xin Wang; Yanbing Chen; Yanjie Li; Zheng Pan; Xiaochen Zhou; Guanzhou Chen; Haoran Lv; Yuning Xu; Yue Ou; Haodong Liu; Shiqi He; Anya Jia; Yulei Xin; Huan Wu; Liang Liu; Jiaye Ge; Jianxin Dong; Dahua Lin; Wenxiu Sun

arXiv:2603.15636·cs.AI·March 30, 2026

AIDABench: AI Data Analytics Benchmark

Yibo Yang, Fei Lei, Yixuan Sun, Yantao Zeng, Chengguang Lv, Jiancao Hong, Jiaojiao Tian, Tianyu Qiu, Xin Wang, Yanbing Chen, Yanjie Li, Zheng Pan, Xiaochen Zhou, Guanzhou Chen, Haoran Lv, Yuning Xu, Yue Ou, Haodong Liu, Shiqi He, Anya Jia, Yulei Xin, Huan Wu, Liang Liu, Jiaye Ge

PDF

1 Repo 1 Datasets

TL;DR

AIDABench is a comprehensive benchmark with over 600 real-world document analysis tasks across question answering, data visualization, and file generation, designed to evaluate AI systems' end-to-end data analytics capabilities.

Contribution

It introduces a challenging, realistic benchmark covering diverse data types and tasks, providing a new standard for evaluating AI performance in practical data analytics scenarios.

Findings

01

Current AI models achieve only 59.43% pass-at-1 on the benchmark.

02

Tasks are so challenging that even human experts need 1-2 hours per question with AI assistance.

03

Analysis highlights key failure modes and challenges for future AI research.

Abstract

As AI-driven document understanding and processing tools become increasingly prevalent in real-world applications, the need for rigorous evaluation standards has grown increasingly urgent. Existing benchmarks and evaluations often focus on isolated capabilities or simplified scenarios, failing to capture the end-to-end task effectiveness required in practical settings. To address this gap, we introduce AIDABench, a comprehensive benchmark for evaluating AI systems on complex data analytics tasks in an end-to-end manner. AIDABench encompasses 600+ diverse document analysis tasks across three core capability dimensions: question answering, data visualization, and file generation. These tasks are grounded in realistic scenarios involving heterogeneous data types, including spreadsheets, databases, financial reports, and operational records, and reflect analytical demands across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MichaelYang-lyx/AIDABench
github

Datasets

MichaelYang-lyx/AIDA
dataset· 793 dl
793 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.