AI Transparency Atlas: Framework, Scoring, and Real-Time Model Card Evaluation Pipeline
Akhmadillo Mamirov, Faiaz Azmain, Hanyu Wang

TL;DR
This paper introduces a comprehensive framework and automated pipeline for evaluating AI model documentation transparency, highlighting significant gaps in safety disclosures across leading models and datasets.
Contribution
It develops a standardized, weighted transparency scoring system and an automated evaluation pipeline to assess AI documentation consistency and safety disclosures.
Findings
Frontier models achieve around 80% compliance.
Most providers fall below 60% transparency score.
Safety-critical disclosures are notably deficient.
Abstract
AI model documentation is fragmented across platforms and inconsistent in structure, preventing policymakers, auditors, and users from reliably assessing safety claims, data provenance, and version-level changes. We analyzed documentation from five frontier models (Gemini 3, Grok 4.1, Llama 4, GPT-5, and Claude 4.5) and 100 Hugging Face model cards, identifying 947 unique section names with extreme naming variation. Usage information alone appeared under 97 distinct labels. Using the EU AI Act Annex IV and the Stanford Transparency Index as baselines, we developed a weighted transparency framework with 8 sections and 23 subsections that prioritizes safety-critical disclosures (Safety Evaluation: 25%, Critical Risk: 20%) over technical specifications. We implemented an automated multi-agent pipeline that extracts documentation from public sources and scores completeness through LLM-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
