Building RadiologyNET: Unsupervised annotation of a large-scale multimodal medical database
Mateja Napravnik, Franko Hr\v{z}i\'c, Sebastian Tschauner, Ivan, \v{S}tajduhar

TL;DR
This paper presents an unsupervised method to automatically annotate a large-scale medical radiology image database by integrating multimodal data sources and clustering techniques, facilitating improved dataset annotation for machine learning applications.
Contribution
The study introduces a novel multimodal unsupervised annotation pipeline that combines image, metadata, and diagnosis data to cluster and label large medical image datasets automatically.
Findings
Fusing multimodal features yields the best clustering performance.
The pipeline successfully clusters over 1.3 million images into 50 meaningful groups.
Cluster homogeneity and mutual information indicate high-quality annotations.
Abstract
Background and objective: The usage of machine learning in medical diagnosis and treatment has witnessed significant growth in recent years through the development of computer-aided diagnosis systems that are often relying on annotated medical radiology images. However, the availability of large annotated image datasets remains a major obstacle since the process of annotation is time-consuming and costly. This paper explores how to automatically annotate a database of medical radiology images with regard to their semantic similarity. Material and methods: An automated, unsupervised approach is used to construct a large annotated dataset of medical radiology images originating from Clinical Hospital Centre Rijeka, Croatia, utilising multimodal sources, including images, DICOM metadata, and narrative diagnoses. Several appropriate feature extractors are tested for each of the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · AI in cancer detection · Biomedical Text Mining and Ontologies
