TL;DR
VIDS introduces a comprehensive, machine-enforceable standard for medical imaging datasets, enhancing validation, provenance, and quality documentation to improve AI development.
Contribution
It provides an open, detailed specification and validation framework for medical imaging datasets, addressing gaps in provenance and quality documentation.
Findings
Benchmarking shows existing datasets meet only 20-39% of VIDS dimensions.
LIDC-Hybrid-100 dataset is fully compliant with VIDS standards.
Provenance and quality gaps are the largest systematic issues in current datasets.
Abstract
Medical imaging AI development is fundamentally dependent on annotated datasets, yet no existing standard provides machine-enforceable validation across dataset structure, annotation provenance, quality documentation, and ML readiness within a single framework. DICOM standardizes image acquisition, storage, and communication at the individual study level. BIDS organizes neuroimaging research datasets with consistent naming conventions. Neither addresses the curation layer, viz., who annotated what, when, with what tool, and to what quality standard. This paper presents VIDS (Verified Imaging Dataset Standard), an open specification that defines folder layout, file naming, annotation provenance schemas, quality documentation, and 21 machine-enforceable validation rules across two compliance profiles. VIDS uses NIfTI as a canonical working format while preserving full DICOM metadata in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
