TL;DR
This study reveals that certain dermatoscopic images are intrinsically ambiguous, causing both AI models and human experts to systematically fail, highlighting fundamental limits in dermatological diagnosis.
Contribution
It introduces a novel analysis of intrinsic image ambiguity affecting AI and human experts, supported by experiments and open data for reproducibility.
Findings
AI models consistently misclassify a subset of images beyond chance.
Expert dermatologists' diagnostic accuracy drops significantly on difficult images.
Image quality is identified as a key factor in intrinsic diagnostic ambiguity.
Abstract
The integration of artificial intelligence (AI), particularly Convolutional Neural Networks (CNNs), into dermatological diagnosis demonstrates substantial clinical potential. While existing literature predominantly benchmarks algorithmic performance against human experts, our study adopts a novel perspective by investigating the intrinsic complexity of dermatoscopic images. Through rigorous experimentation with multiple CNN architectures, we isolated a subset of images systematically misclassified across all models-a phenomenon statistically proven to exceed random chance. To determine if these failures stem from algorithmic biases or inherent visual ambiguity, expert dermatologists independently evaluated these challenging cases alongside a control group. The results revealed a collapse in human diagnostic performance on the AI-misclassified images. First, agreement with ground-truth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
