Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS   Instance Segmentation

Chengxi Zeng; Xinyu Yang; David Smithard; Majid Mirmehdi; Alberto M; Gambaruto; Tilo Burghardt

arXiv:2302.11325·cs.CV·February 13, 2024·1 cites

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M, Gambaruto, Tilo Burghardt

PDF

Open Access 2 Repos

TL;DR

This paper introduces Video-SwinUNet, a deep learning framework that effectively captures spatio-temporal features for medical video segmentation, significantly improving performance on VFSS datasets by integrating CNNs, transformers, and temporal feature blending.

Contribution

The paper proposes a novel spatio-temporal deep learning framework combining CNNs, transformers, and temporal feature blending for improved medical video segmentation.

Findings

01

Achieved a dice coefficient of 0.8986 and 0.8186 on VFSS2022 datasets.

02

Outperforms existing methods significantly in segmentation benchmarks.

03

Demonstrates effective cross-dataset transferability of the model.

Abstract

This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · AI in cancer detection

MethodsAttention Is All You Need · Linear Layer · Dropout · Byte Pair Encoding · Adam · Multi-Head Attention · Residual Connection · Layer Normalization · Stochastic Depth · Softmax