A Novel Perspective for Multi-modal Multi-label Skin Lesion Classification
Yuan Zhang, Yutong Xie, Hu Wang, Jodie C Avery, M Louise Hull and, Gustavo Carneiro

TL;DR
This paper presents SkinM2Former, a multi-modal, multi-label skin lesion classifier that effectively fuses clinical data and images using transformer-based models, addressing imbalanced learning and label correlation issues, and outperforming existing methods.
Contribution
The paper introduces SkinM2Former, a novel transformer-based model that fuses multiple data modalities and learns label correlations for improved skin lesion classification.
Findings
Achieved 77.27% mean average accuracy on Derm7pt dataset.
Outperformed state-of-the-art methods in multi-modal multi-label classification.
Effectively handled imbalanced data and multi-label correlations.
Abstract
The efficacy of deep learning-based Computer-Aided Diagnosis (CAD) methods for skin diseases relies on analyzing multiple data modalities (i.e., clinical+dermoscopic images, and patient metadata) and addressing the challenges of multi-label classification. Current approaches tend to rely on limited multi-modal techniques and treat the multi-label problem as a multiple multi-class problem, overlooking issues related to imbalanced learning and multi-label correlation. This paper introduces the innovative Skin Lesion Classifier, utilizing a Multi-modal Multi-label TransFormer-based model (SkinM2Former). For multi-modal analysis, we introduce the Tri-Modal Cross-attention Transformer (TMCT) that fuses the three image and metadata modalities at various feature levels of a transformer encoder. For multi-label classification, we introduce a multi-head attention (MHA) module to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCutaneous Melanoma Detection and Management
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Residual Connection
