Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT   Classification with Transformer Networks

Zihao Jin; Yingying Fang; Jiahao Huang; Caiwen Xu; Simon Walsh; Guang; Yang

arXiv:2406.17173·eess.IV·June 28, 2024

Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks

Zihao Jin, Yingying Fang, Jiahao Huang, Caiwen Xu, Simon Walsh, Guang, Yang

PDF

Open Access

TL;DR

Diff3Dformer introduces a diffusion-based transformer approach that effectively leverages 3D CT scan data, improving classification accuracy on small datasets by addressing overfitting and utilizing slice sequence diffusion.

Contribution

The paper presents a novel Diffusion-based 3D Vision Transformer that enhances small dataset classification by integrating diffusion models and clustering attention mechanisms.

Findings

01

Outperforms existing 3D methods on lung CT classification

02

Demonstrates robustness across different dataset scales

03

Surpasses state-of-the-art transformer approaches during COVID-19 pandemic

Abstract

The manifestation of symptoms associated with lung diseases can vary in different depths for individual patients, highlighting the significance of 3D information in CT scans for medical image classification. While Vision Transformer has shown superior performance over convolutional neural networks in image classification tasks, their effectiveness is often demonstrated on sufficiently large 2D datasets and they easily encounter overfitting issues on small medical image datasets. To address this limitation, we propose a Diffusion-based 3D Vision Transformer (Diff3Dformer), which utilizes the latent space of the Diffusion model to form the slice sequence for 3D analysis and incorporates clustering attention into ViT to aggregate repetitive information within 3D CT scans, thereby harnessing the power of the advanced transformer in 3D classification tasks on small datasets. Our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Medical Imaging and Analysis · AI in cancer detection

MethodsSoftmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Diffusion · Position-Wise Feed-Forward Layer · Dropout · Adam · Attention Is All You Need · Linear Layer