ViTBIS: Vision Transformer for Biomedical Image Segmentation

Abhinav Sagar

arXiv:2201.05920·eess.IV·January 19, 2022

ViTBIS: Vision Transformer for Biomedical Image Segmentation

Abhinav Sagar

PDF

Open Access

TL;DR

ViTBIS is a novel vision transformer architecture designed for biomedical image segmentation, utilizing multi-scale convolutions, transformer blocks, and skip connections to outperform previous CNN and transformer models on multiple datasets.

Contribution

This paper introduces ViTBIS, a new transformer-based network with multi-scale convolutions and skip connections for improved biomedical image segmentation.

Findings

01

Outperforms previous CNN and transformer models on multiple datasets

02

Achieves higher Dice scores and better Hausdorff distances

03

Effective multi-scale feature integration enhances segmentation accuracy

Abstract

In this paper, we propose a novel network named Vision Transformer for Biomedical Image Segmentation (ViTBIS). Our network splits the input feature maps into three parts with $1 \times 1$ , $3 \times 3$ and $5 \times 5$ convolutions in both encoder and decoder. Concat operator is used to merge the features before being fed to three consecutive transformer blocks with attention mechanism embedded inside it. Skip connections are used to connect encoder and decoder transformer blocks. Similarly, transformer blocks and multi scale architecture is used in decoder before being linearly projected to produce the output segmentation map. We test the performance of our network using Synapse multi-organ segmentation dataset, Automated cardiac diagnosis challenge dataset, Brain tumour MRI segmentation dataset and Spleen CT segmentation dataset. Without bells and whistles, our network outperforms most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRadiomics and Machine Learning in Medical Imaging · Medical Imaging and Analysis · Brain Tumor Detection and Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Absolute Position Encodings · Byte Pair Encoding