ViTBIS: Vision Transformer for Biomedical Image Segmentation
Abhinav Sagar

TL;DR
ViTBIS is a novel vision transformer architecture designed for biomedical image segmentation, utilizing multi-scale convolutions, transformer blocks, and skip connections to outperform previous CNN and transformer models on multiple datasets.
Contribution
This paper introduces ViTBIS, a new transformer-based network with multi-scale convolutions and skip connections for improved biomedical image segmentation.
Findings
Outperforms previous CNN and transformer models on multiple datasets
Achieves higher Dice scores and better Hausdorff distances
Effective multi-scale feature integration enhances segmentation accuracy
Abstract
In this paper, we propose a novel network named Vision Transformer for Biomedical Image Segmentation (ViTBIS). Our network splits the input feature maps into three parts with , and convolutions in both encoder and decoder. Concat operator is used to merge the features before being fed to three consecutive transformer blocks with attention mechanism embedded inside it. Skip connections are used to connect encoder and decoder transformer blocks. Similarly, transformer blocks and multi scale architecture is used in decoder before being linearly projected to produce the output segmentation map. We test the performance of our network using Synapse multi-organ segmentation dataset, Automated cardiac diagnosis challenge dataset, Brain tumour MRI segmentation dataset and Spleen CT segmentation dataset. Without bells and whistles, our network outperforms most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Medical Imaging and Analysis · Brain Tumor Detection and Classification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Absolute Position Encodings · Byte Pair Encoding
