SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data

Dianyu Liu; Chuan Qin; Xi Chen; Xiaohan Li; Wenxi Xu; Yuyang Wang; Xin Chen; Yuanchun Zhou; Hengshu Zhu

arXiv:2604.26645·cs.AI·April 30, 2026

SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data

Dianyu Liu, Chuan Qin, Xi Chen, Xiaohan Li, Wenxi Xu, Yuyang Wang, Xin Chen, Yuanchun Zhou, Hengshu Zhu

PDF

TL;DR

SciHorizon-DataEVA is a comprehensive agentic system designed to systematically evaluate the AI-readiness of diverse scientific datasets, enhancing AI integration in scientific discovery.

Contribution

It introduces the Sci-TQA2 principles and a hierarchical multi-agent evaluation approach for scalable, fine-grained assessment of scientific data's AI-readiness.

Findings

01

Effective evaluation across multiple scientific domains

02

Demonstrated scalability and reliability of the system

03

Enabled principled assessment of heterogeneous datasets

Abstract

AI-for-Science (AI4Science) is increasingly transforming scientific discovery by embedding machine learning models into prediction, simulation, and hypothesis generation workflows across domains. However, the effectiveness of these models is fundamentally constrained by the AI-readiness of scientific data, for which no scalable and systematic evaluation mechanism currently exists. In this work, we propose SciHorizon-DataEVA, a novel agentic system to scalable AI-readiness evaluation of heterogeneous scientific data. At the evaluation-criteria level, we introduce the Sci-TQA2 principles, which organize AI-readiness into four complementary dimensions: Governance Trustworthiness, Data Quality, AI Compatibility, and Scientific Adaptability. Each dimension is decomposed into measurable atomic elements that enable fine-grained and executable assessment. To operationalize these principles at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.