BUSClean: Open-source software for breast ultrasound image pre-processing and knowledge extraction for medical AI
Arianna Bunnell, Kailee Hung, John A. Shepherd, Peter Sadowski

TL;DR
BUSClean is an open-source tool that automates preprocessing and knowledge extraction from breast ultrasound images, improving dataset quality for AI development with high accuracy and adaptability.
Contribution
This paper introduces BUSClean, a modular open-source software for cleaning and annotating breast ultrasound datasets, with demonstrated high accuracy and adaptability to new data.
Findings
Achieved over 95% sensitivity and 98% specificity in detecting annotations and scan irregularities.
Successfully adapted caliper detection to new data, significantly improving sensitivity and specificity.
Demonstrated robustness across internal and external datasets.
Abstract
Development of artificial intelligence (AI) for medical imaging demands curation and cleaning of large-scale clinical datasets comprising hundreds of thousands of images. Some modalities, such as mammography, contain highly standardized imaging. In contrast, breast ultrasound imaging (BUS) can contain many irregularities not indicated by scan metadata, such as enhanced scan modes, sonographer annotations, or additional views. We present an open-source software solution for automatically processing clinical BUS datasets. The algorithm performs BUS scan filtering (flagging of invalid and non-B-mode scans), cleaning (dual-view scan detection, scan area cropping, and caliper detection), and knowledge extraction (BI-RADS Labeling and Measurement fields) from sonographer annotations. Its modular design enables users to adapt it to new settings. Experiments on an internal testing dataset of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection
