TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis
Yilan Zhang, Fengying Xie, Jianqi Chen

TL;DR
TFormer introduces a pure transformer-based framework for multi-modal skin lesion diagnosis, effectively fusing heterogeneous data and capturing representative features across modalities for improved diagnostic accuracy.
Contribution
The paper proposes a novel Throughout Fusion Transformer (TFormer) that enhances multi-modal data integration using hierarchical multi-modal transformer blocks and a post-fusion module.
Findings
Effective multi-modal fusion across spatially unaligned data
Improved feature representation in shallow layers
Enhanced diagnosis performance on skin lesion datasets
Abstract
Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by modern computer-aided diagnosis (CAD) technology based on deep convolutions. However, the information aggregation across modalities in MSLD remains challenging due to severity unaligned spatial resolution (e.g., dermoscopic image and clinical image) and heterogeneous data (e.g., dermoscopic image and patients' meta-data). Limited by the intrinsic local attention, most recent MSLD pipelines using pure convolutions struggle to capture representative features in shallow layers, thus the fusion across different modalities is usually done at the end of the pipelines, even at the last layer, leading to an insufficient information aggregation. To tackle the issue, we introduce a pure transformer-based method, which we refer to as ``Throughout Fusion Transformer (TFormer)'', for sufficient information integration in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCutaneous Melanoma Detection and Management · Optical Coherence Tomography Applications
MethodsMulti-Head Attention · Softmax · Layer Normalization · Adam · Linear Layer · Dense Connections · Residual Connection · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing
