Challenges of building medical image datasets for development of deep learning software in stroke
Alessandro Fontanella, Wenwen Li, Grant Mair, Antreas Antoniou,, Eleanor Platt, Chloe Martin, Paul Armitage, Emanuele Trucco, Joanna Wardlaw,, Amos Storkey

TL;DR
This paper presents a semi-automatic pipeline for standardizing heterogeneous brain CT datasets to facilitate deep learning development, addressing challenges like orientation, image type, and redundancy, with a focus on improving dataset preparation efficiency.
Contribution
The authors develop and describe a comprehensive pipeline that automates the standardization of clinical brain CT datasets for deep learning, handling various data inconsistencies and reducing manual effort.
Findings
Processed 45% of datasets successfully
93% acceptance rate among suitable scans
Identified key reasons for dataset rejection
Abstract
Despite the large amount of brain CT data generated in clinical practice, the availability of CT datasets for deep learning (DL) research is currently limited. Furthermore, the data can be insufficiently or improperly prepared for machine learning and thus lead to spurious and irreproducible analyses. This lack of access to comprehensive and diverse datasets poses a significant challenge for the development of DL algorithms. In this work, we propose a complete semi-automatic pipeline to address the challenges of preparing a clinical brain CT dataset for DL analysis and describe the process of standardising this heterogeneous dataset. Challenges include handling image sets with different orientations (axial, sagittal, coronal), different image types (to view soft tissues or bones) and dimensions, and removing redundant background. The final pipeline was able to process 5,868/10,659 (45%)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcute Ischemic Stroke Management · Medical Imaging and Analysis · Advanced Neural Network Applications
