Semi-Supervised Siamese Network for Identifying Bad Data in Medical Imaging Datasets
Niamh Belton, Aonghus Lawlor, Kathleen M. Curran

TL;DR
This paper introduces a semi-supervised Siamese network approach to identify bad data in medical imaging datasets, improving data quality for robust model training with minimal expert review.
Contribution
The novel semi-supervised Siamese network method efficiently detects bad medical images using only a small reference set and outperforms previous approaches.
Findings
Achieved an AUC of 0.989 in bad data detection
Requires minimal expert review of reference images
Effective in identifying images lacking major anatomical structures
Abstract
Noisy data present in medical imaging datasets can often aid the development of robust models that are equipped to handle real-world data. However, if the bad data contains insufficient anatomical information, it can have a severe negative effect on the model's performance. We propose a novel methodology using a semi-supervised Siamese network to identify bad data. This method requires only a small pool of 'reference' medical images to be reviewed by a non-expert human to ensure the major anatomical structures are present in the Field of View. The model trains on this reference set and identifies bad data by using the Siamese network to compute the distance between the reference set and all other medical images in the dataset. This methodology achieves an Area Under the Curve (AUC) of 0.989 for identifying bad data. Code will be available at https://git.io/JYFuV.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Machine Learning in Healthcare · Artificial Intelligence in Healthcare
MethodsSiamese Network
