Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation
Yanglan Ou, Ye Yuan, Xiaolei Huang, Stephen T.C. Wong, John Volpi,, James Z. Wang, Kelvin Wong

TL;DR
Patcher is a novel encoder-decoder Vision Transformer architecture that uses patch segmentation and a mixture-of-experts decoder to improve medical image segmentation accuracy, combining local detail and global context.
Contribution
It introduces Patcher blocks with overlapping large patches and a MoE-based decoder, enhancing spatial modeling and feature specialization in medical image segmentation.
Findings
Outperforms state-of-the-art methods on stroke lesion segmentation
Achieves superior results on polyp segmentation
Demonstrates effective feature extraction from local to global levels
Abstract
We present a new encoder-decoder Vision Transformer architecture, Patcher, for medical image segmentation. Unlike standard Vision Transformers, it employs Patcher blocks that segment an image into large patches, each of which is further divided into small patches. Transformers are applied to the small patches within a large patch, which constrains the receptive field of each pixel. We intentionally make the large patches overlap to enhance intra-patch communication. The encoder employs a cascade of Patcher blocks with increasing receptive fields to extract features from local to global levels. This design allows Patcher to benefit from both the coarse-to-fine feature extraction common in CNNs and the superior spatial relationship modeling of Transformers. We also propose a new mixture-of-experts (MoE) based decoder, which treats the feature maps from the encoder as experts and selects a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Brain Tumor Detection and Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Byte Pair Encoding · Adam · Label Smoothing · Residual Connection · Dropout
