VIDS: A Verified Imaging Dataset Standard for Medical AI

Joan S. Muthu; John Shalen

arXiv:2604.17525·eess.IV·April 21, 2026

VIDS: A Verified Imaging Dataset Standard for Medical AI

Joan S. Muthu, John Shalen

PDF

1 Repo

TL;DR

VIDS introduces a comprehensive, machine-enforceable standard for medical imaging datasets, enhancing validation, provenance, and quality documentation to improve AI development.

Contribution

It provides an open, detailed specification and validation framework for medical imaging datasets, addressing gaps in provenance and quality documentation.

Findings

01

Benchmarking shows existing datasets meet only 20-39% of VIDS dimensions.

02

LIDC-Hybrid-100 dataset is fully compliant with VIDS standards.

03

Provenance and quality gaps are the largest systematic issues in current datasets.

Abstract

Medical imaging AI development is fundamentally dependent on annotated datasets, yet no existing standard provides machine-enforceable validation across dataset structure, annotation provenance, quality documentation, and ML readiness within a single framework. DICOM standardizes image acquisition, storage, and communication at the individual study level. BIDS organizes neuroimaging research datasets with consistent naming conventions. Neither addresses the curation layer, viz., who annotated what, when, with what tool, and to what quality standard. This paper presents VIDS (Verified Imaging Dataset Standard), an open specification that defines folder layout, file naming, annotation provenance schemas, quality documentation, and 21 machine-enforceable validation rules across two compliance profiles. VIDS uses NIfTI as a canonical working format while preserving full DICOM metadata in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://doi.org/10.5281/zenodo.19582717
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.