AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance

Abiodun A. Solanke

arXiv:2604.12875·cs.AI·April 24, 2026

AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance

Abiodun A. Solanke

PDF

TL;DR

AISafetyBenchExplorer is a comprehensive catalogue of 195 AI safety benchmarks highlighting fragmentation, lack of standardization, and the need for better governance in AI safety measurement.

Contribution

The paper introduces AISafetyBenchExplorer, a structured, meta-analytical catalogue that organizes and analyzes AI safety benchmarks to improve measurement coherence.

Findings

01

Benchmark proliferation exceeds measurement standardization.

02

Most benchmarks are medium-complexity and English-only.

03

Many repositories and datasets are stale or inconsistently maintained.

Abstract

The rapid expansion of large language model (LLM) safety evaluation has produced a substantial benchmark ecosystem, but not a correspondingly coherent measurement ecosystem. We present AISafetyBenchExplorer, a structured catalogue of 195 AI safety benchmarks released between 2018 and 2026, organized through a multi-sheet schema that records benchmark-level metadata, metric-level definitions, benchmark-paper metadata, and repository activity. This design enables meta-analysis not only of what benchmarks exist, but also of how safety is operationalized, aggregated, and judged across the literature. Using the updated catalogue, we identify a central structural problem: benchmark proliferation has outpaced measurement standardization. The current landscape is dominated by medium-complexity benchmarks (94/195), while only 7 benchmarks occupy the Popular tier. The workbook further reports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.