MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

Haoxuan Zhang; Ruochi Li; Yang Zhang; Zhenni Liang; Junhua Ding; Ting Xiao; Haihua Chen

arXiv:2604.23539·cs.AI·April 28, 2026

MetaGAI: A Large-Scale and High-Quality Benchmark for Generative AI Model and Data Card Generation

Haoxuan Zhang, Ruochi Li, Yang Zhang, Zhenni Liang, Junhua Ding, Ting Xiao, Haihua Chen

PDF

1 Repo

TL;DR

MetaGAI introduces a large-scale benchmark dataset for evaluating automated generation of Model and Data Cards in Generative AI, addressing scalability and quality issues with a multi-agent framework and comprehensive human validation.

Contribution

The paper presents MetaGAI, a novel benchmark with 2,541 verified document triplets, employing a multi-agent system and human-in-the-loop validation for systematic evaluation of generation methods.

Findings

01

Sparse Mixture-of-Experts architectures outperform others in cost-quality efficiency.

02

A fundamental trade-off exists between faithfulness and completeness in generated documents.

03

MetaGAI enables benchmarking, training, and analysis of automated documentation methods at scale.

Abstract

The rapid proliferation of Generative AI necessitates rigorous documentation standards for transparency and governance. However, manual creation of Model and Data Cards is not scalable, while automated approaches lack large-scale, high-fidelity benchmarks for systematic evaluation. We introduce MetaGAI, a comprehensive benchmark comprising 2,541 verified document triplets constructed through semantic triangulation of academic papers, GitHub repositories, and Hugging Face artifacts. Unlike prior single-source datasets, MetaGAI employs a multi-agent framework with specialized Retriever, Generator, and Editor agents, validated through four-dimensional human-in-the-loop assessment, including human evaluation of editor-refined ground truth. We establish a robust evaluation protocol combining automated metrics with validated LLM-as-a-Judge frameworks. Extensive analysis reveals that sparse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haoxuan-unt2024/MetaGAI-Benchmark
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.