# A Multi-Dimensional Framework for Data Quality Assurance in Cancer Imaging Repositories

**Authors:** Olga Tsave, Alexandra Kosvyra, Dimitrios T. Filos, Dimitris Th. Fotopoulos, Ioanna Chouvarda

PMC · DOI: 10.3390/cancers17193213 · Cancers · 2025-10-01

## TL;DR

This paper introduces a framework to ensure high-quality cancer imaging data for reliable AI development in healthcare.

## Contribution

The novel contribution is a multi-dimensional data validation framework for cancer imaging repositories to ensure quality and fairness.

## Key findings

- The framework identifies data quality issues like missing clinical info and inconsistent formatting.
- Structured data entry and standardized protocols improve data quality and interoperability.
- The approach ensures equity and reusability in large-scale medical data curation.

## Abstract

In cancer imaging research, data collection, integration, and utilization to generate multicentric data repositories pose a series of significant challenges such as data harmonization, data quality, utility, and overall suitability for reuse. These challenges directly affect the reliability and robustness of research outcomes, making systematic approaches essential. This work presents the INCISIVE project approach for assessing the quality of cancer imaging and clinical (meta)data in a structured and transparent way. The proposed methodology serves as a guiding map to ensure the creation and maintenance of a high-quality data repository, which is a crucial factor for generalizable and trustworthy AI-services development and their safe adoption in healthcare practice.

Background/Objectives: Cancer remains a leading global cause of death, with breast, lung, colorectal, and prostate cancers being among the most prevalent. The integration of Artificial Intelligence (AI) into cancer imaging research offers opportunities for earlier diagnosis and personalized treatment. However, the effectiveness of AI models depends critically on the quality, standardization, and fairness of the input data. The EU-funded INCISIVE project aimed to create a federated, pan-European repository of imaging and clinical data for cancer cases, with a key objective to develop a robust framework for pre-validating data prior to its use in AI development. Methods: We propose a data validation framework to assess clinical (meta)data and imaging data across five dimensions: completeness, validity, consistency, integrity, and fairness. The framework includes procedures for deduplication, annotation verification, DICOM metadata analysis, and anonymization compliance. Results: The pre-validation process identified key data quality issues, such as missing clinical information, inconsistent formatting, and subgroup imbalances, while also demonstrating the added value of structured data entry and standardized protocols. Conclusions: This structured framework addresses common challenges in curating large-scale, multimodal medical data. By applying this approach, the INCISIVE project ensures data quality, interoperability, and equity, providing a transferable model for future health data repositories supporting AI research in oncology.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989), lung cancer (MONDO:0005138), colorectal cancer (MONDO:0005575), prostate cancer (MONDO:0005159)

## Full-text entities

- **Diseases:** breast, lung, colorectal, and prostate cancers (MESH:D001943), death (MESH:D003643), Cancer (MESH:D009369)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12524141/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12524141/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/PMC12524141/full.md

---
Source: https://tomesphere.com/paper/PMC12524141