Cancer-Net SCa-Synth: An Open Access Synthetically Generated 2D Skin Lesion Dataset for Skin Cancer Classification
Chi-en Amy Tai, Oustan Ding, Alexander Wong

TL;DR
This paper presents Cancer-Net SCa-Synth, a large synthetic dataset of skin lesions generated using advanced AI models to improve skin cancer classification accuracy and address data imbalance issues.
Contribution
We introduce a new synthetic skin lesion dataset using Stable Diffusion and DreamBooth, enhancing deep learning models for skin cancer detection.
Findings
Synthetic data improves model performance on ISIC 2020 test set
The dataset is publicly available for research use
Synthetic images help balance class distribution in datasets
Abstract
In the United States, skin cancer ranks as the most commonly diagnosed cancer, presenting a significant public health issue due to its high rates of occurrence and the risk of serious complications if not caught early. Recent advancements in dataset curation and deep learning have shown promise in quick and accurate detection of skin cancer. However, current open-source datasets have significant class imbalances which impedes the effectiveness of these deep learning models. In healthcare, generative artificial intelligence (AI) models have been employed to create synthetic data, addressing data imbalance in datasets by augmenting underrepresented classes and enhancing the overall quality and performance of machine learning models. In this paper, we build on top of previous work by leveraging new advancements in generative AI, notably Stable Diffusion and DreamBooth. We introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCutaneous Melanoma Detection and Management
MethodsDiffusion · Sparse Evolutionary Training
