A Data-scalable Transformer for Medical Image Segmentation:   Architecture, Model Efficiency, and Benchmark

Yunhe Gao; Mu Zhou; Di Liu; Zhennan Yan; Shaoting Zhang; Dimitris N.; Metaxas

arXiv:2203.00131·eess.IV·April 6, 2023·64 cites

A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark

Yunhe Gao, Mu Zhou, Di Liu, Zhennan Yan, Shaoting Zhang, Dimitris N., Metaxas

PDF

Open Access 2 Repos

TL;DR

MedFormer is a scalable Transformer architecture for 3D medical image segmentation that effectively learns from limited data, generalizes across diverse tasks, and outperforms CNNs and existing Transformers on multiple datasets.

Contribution

The paper introduces MedFormer, a novel data-scalable Transformer with hierarchical modeling and multi-scale feature fusion, designed specifically for medical image segmentation without requiring pre-training.

Findings

01

Outperforms CNNs and vision Transformers on seven public datasets

02

Effective across multiple modalities like CT and MRI

03

Handles diverse targets including organs, tissues, and tumors

Abstract

Transformers have demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn from limited medical data and are unable to generalize on diverse medical image tasks. To tackle these challenges, we present MedFormer, a data-scalable Transformer designed for generalizable 3D medical image segmentation. Our approach incorporates three key elements: a desirable inductive bias, hierarchical modeling with linear-complexity attention, and multi-scale feature fusion that integrates spatial and semantic information globally. MedFormer can learn across tiny- to large-scale data without pre-training. Comprehensive experiments demonstrate MedFormer's potential as a versatile segmentation backbone, outperforming CNNs and vision Transformers on seven public datasets covering multiple modalities (e.g., CT and MRI) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · COVID-19 diagnosis using AI

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Pointwise Convolution · Dense Connections · Softmax · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection