The HAM10000 dataset, a large collection of multi-source dermatoscopic   images of common pigmented skin lesions

Philipp Tschandl; Cliff Rosendahl; Harald Kittler

arXiv:1803.10417·cs.CV·November 27, 2018

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Philipp Tschandl, Cliff Rosendahl, Harald Kittler

PDF

5 Repos

TL;DR

The HAM10000 dataset provides a large, diverse collection of dermatoscopic images of pigmented skin lesions, facilitating improved training and benchmarking of machine learning models for skin cancer diagnosis.

Contribution

This paper introduces the HAM10000 dataset, a large, multi-source dermatoscopic image collection with diverse acquisition methods, enabling better machine learning training and comparison with human experts.

Findings

01

Dataset contains 10015 images covering major pigmented lesion categories.

02

More than 50% of lesions are pathologically confirmed.

03

The dataset is publicly available for research and benchmarking.

Abstract

Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.