Introducing SDICE: An Index for Assessing Diversity of Synthetic Medical Datasets
Mohammed Talha Alam, Raza Imam, Mohammad Areeb Qazi, Asim Ukaye and, Karthik Nandakumar

TL;DR
The paper introduces SDICE, a new index that quantifies the diversity of synthetic medical datasets by comparing similarity distributions with real datasets using a contrastive encoder.
Contribution
This work presents the SDICE index, a novel metric for evaluating diversity in synthetic datasets based on similarity distribution analysis with pre-trained contrastive encoders.
Findings
SDICE effectively measures diversity in synthetic medical datasets.
Experiments on MIMIC-chest X-ray and ImageNet validate SDICE's usefulness.
SDICE provides a normalized, comparable diversity metric across domains.
Abstract
Advancements in generative modeling are pushing the state-of-the-art in synthetic medical image generation. These synthetic images can serve as an effective data augmentation method to aid the development of more accurate machine learning models for medical image analysis. While the fidelity of these synthetic images has progressively increased, the diversity of these images is an understudied phenomenon. In this work, we propose the SDICE index, which is based on the characterization of similarity distributions induced by a contrastive encoder. Given a synthetic dataset and a reference dataset of real images, the SDICE index measures the distance between the similarity score distributions of original and synthetic images, where the similarity scores are estimated using a pre-trained contrastive encoder. This distance is then normalized using an exponential function to provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical and Engineering Education · Health, Environment, Cognitive Aging · Bioinformatics and Genomic Networks
