Transforming a clinical study database into a structured database adapted to artificial intelligence applications
Thibault Sauron, Carole Lazarus, Camille Kurtz, Florence Cloppet, Isabelle Thomassin Naggara, Edouard Poncelet, Edouard Poncelet, Aurelie Jalaguier-Coudray, Ingrid Millet, Valerie Juhan, Corinne Balleyguier, Caroline Malhaire, Nicolas Perrot, Marc Bazot, Patrice Taourel

TL;DR
This paper presents a method to convert clinical trial MRI data into a structured database suitable for training AI models.
Contribution
The paper introduces a novel curation methodology and open-source tools for adapting clinical trial data for AI applications.
Findings
A curation process was developed to simplify and harmonize clinical trial MRI data for AI use.
The number of files and folders was significantly reduced, and only essential DICOM fields were retained.
Quality control and harmonization steps improved data consistency for AI model training.
Abstract
Medical imaging databases suitable for training machine learning/computer vision algorithms are scarce, limiting the potential for development and generalisation of clinical tools. Clinical trial databases are a source of data, known for their high-quality data and reliable annotations. However, they are not tailored to the needs of machine learning or deep learning models. Our objective was to develop a methodology and tools that enable the curation of these databases specifically for the training or testing of artificial intelligence tools. MRIs from the French centres of the EURAD clinical trial (MRI of women with pelvic adnexal lesions) were used to constitute the database. We developed the steps required to curate a clinical trial database: definition of inclusion and exclusion criteria, removal of unnecessary data according to the principle of parsimony, quality control, and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Radiomics and Machine Learning in Medical Imaging · AI in cancer detection
