# Diagnostic Accuracy of Artificial Intelligence Applications on a Diverse Skin Image Set

**Authors:** Amiya K Shah, Megha Agarwal

PMC · DOI: 10.7759/cureus.102354 · Cureus · 2026-01-26

## TL;DR

This study tested popular AI-powered skin apps for diagnosing skin lesions and found they performed poorly, especially on diverse skin tones.

## Contribution

The study evaluates the diagnostic accuracy of popular AI skin apps using a diverse skin image database, revealing low performance and limitations in real-world use.

## Key findings

- All tested AI apps had an overall diagnostic accuracy of 22%.
- Average sensitivity for detecting malignant lesions was 46.57%, and specificity was 72.06%.
- Training ChatGPT with additional images did not improve its diagnostic accuracy.

## Abstract

Background

Several new mobile applications (apps) have been developed that utilize artificial intelligence (AI) to diagnose skin lesions.

Objective

The goal of this study was to evaluate the diagnostic accuracy of the most popular smartphone apps using a database of skin lesion images with diverse skin tones. An additional goal was to measure the apps’ sensitivity and specificity in detecting skin cancer.

Methods

A thorough search was performed in the Google Play Store and Apple App Store to find the most popular skin apps that diagnose skin lesions. We used the Stanford Diverse Dermatology Images database (DDI) to test the accuracy of the following apps: ChatGPT (OpenAI, San Francisco, CA, USA), AI skin scanner Rash Detector (by I Lov Guitars Inc., Scarborough, ON), Rash ID (Appsmiths LLC, Canton, MS USA), and Skin Scanner Dermatology & Acne (ACINA, UAB, located at Krokuvos, Vilnius, Lithuania). One hundred and two images with a range of diagnoses were selected for upload to each app. Fifty-one images were malignant, and 51 were benign. We also trained a new model of ChatGPT using a separate set of 554 images from the same database.

Results

All the apps had low diagnostic accuracy. The overall accuracy was 22%. When classifying benign versus malignant diagnoses, the apps had an average sensitivity of 46.57% and an average specificity of 72.06%. The average positive predictive value was 67.44%, and the average negative predictive value was 58.06%. In our study, training ChatGPT did not improve its diagnostic accuracy.

Conclusions

ChatGPT, Rash Detector, Rash ID, and Skin Scanner Dermatology & Acne performed poorly at diagnosing skin lesions from a database with diverse skin tones. These apps should not be used as stand-alone diagnostic tools.

## Linked entities

- **Diseases:** skin cancer (MONDO:0002898)

## Full-text entities

- **Diseases:** seborrheic keratosis (MESH:D017492), lesion (MESH:D009059), Skin cancer (MESH:D012878), lip diseases (MESH:D008047), skin type I-II (MESH:D006969), Kaposi sarcoma (MESH:D012514), squamous cell carcinoma (MESH:D002294), melanocytic nevi (MESH:D009508), sebaceous carcinoma (MESH:D012626), lipoma (MESH:D008067), inflammatory (MESH:D007249), Melanoma (MESH:D008545), verruca vulgaris (MESH:D014860), post-inflammatory hyperpigmentation (MESH:D017495), Fitzpatrick skin (MESH:D012871), skin type V-VI (MESH:C536047), Cancers (MESH:D009369), actinic keratosis (MESH:D055623), onychomycosis (MESH:D014009), ecchymosis (MESH:D004438), subcutaneous lymphoma (MESH:D008223), bacterial infections (MESH:D001424), mycosis fungoides (MESH:D009182), mycosis fungoidosis (MESH:D015821), metastatic carcinoma (MESH:C538445), erythema (MESH:D004890), Acne (MESH:D000152), DDI (MESH:C564543), basal cell carcinoma (MESH:D002280), angioma (MESH:D006391), neurofibroma (MESH:D009455), epidermal nevus (MESH:C580062), epidermal cyst (MESH:D004814), acrochordon (MESH:D058249), subcutaneous T-cell lymphoma (MESH:D016399), dermatofibroma (MESH:D018219), pyogenic granuloma (MESH:D017789), skin type III-IV (MESH:C000631847), infections (MESH:D007239), keloid (MESH:D007627), Dermatology (MESH:D000168), solar lentigo (MESH:D007911)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12936399/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12936399/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12936399/full.md

---
Source: https://tomesphere.com/paper/PMC12936399