TFormer: A throughout fusion transformer for multi-modal skin lesion   diagnosis

Yilan Zhang; Fengying Xie; Jianqi Chen

arXiv:2211.11393·cs.CV·March 3, 2023

TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

Yilan Zhang, Fengying Xie, Jianqi Chen

PDF

Open Access 1 Repo

TL;DR

TFormer introduces a pure transformer-based framework for multi-modal skin lesion diagnosis, effectively fusing heterogeneous data and capturing representative features across modalities for improved diagnostic accuracy.

Contribution

The paper proposes a novel Throughout Fusion Transformer (TFormer) that enhances multi-modal data integration using hierarchical multi-modal transformer blocks and a post-fusion module.

Findings

01

Effective multi-modal fusion across spatially unaligned data

02

Improved feature representation in shallow layers

03

Enhanced diagnosis performance on skin lesion datasets

Abstract

Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by modern computer-aided diagnosis (CAD) technology based on deep convolutions. However, the information aggregation across modalities in MSLD remains challenging due to severity unaligned spatial resolution (e.g., dermoscopic image and clinical image) and heterogeneous data (e.g., dermoscopic image and patients' meta-data). Limited by the intrinsic local attention, most recent MSLD pipelines using pure convolutions struggle to capture representative features in shallow layers, thus the fusion across different modalities is usually done at the end of the pipelines, even at the last layer, leading to an insufficient information aggregation. To tackle the issue, we introduce a pure transformer-based method, which we refer to as ``Throughout Fusion Transformer (TFormer)'', for sufficient information integration in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zylbuaa/tformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCutaneous Melanoma Detection and Management · Optical Coherence Tomography Applications

MethodsMulti-Head Attention · Softmax · Layer Normalization · Adam · Linear Layer · Dense Connections · Residual Connection · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing