Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation
Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M, Gambaruto, Tilo Burghardt

TL;DR
This paper introduces Video-SwinUNet, a deep learning framework that effectively captures spatio-temporal features for medical video segmentation, significantly improving performance on VFSS datasets by integrating CNNs, transformers, and temporal feature blending.
Contribution
The paper proposes a novel spatio-temporal deep learning framework combining CNNs, transformers, and temporal feature blending for improved medical video segmentation.
Findings
Achieved a dice coefficient of 0.8986 and 0.8186 on VFSS2022 datasets.
Outperforms existing methods significantly in segmentation benchmarks.
Demonstrates effective cross-dataset transferability of the model.
Abstract
This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Radiomics and Machine Learning in Medical Imaging · AI in cancer detection
MethodsAttention Is All You Need · Linear Layer · Dropout · Byte Pair Encoding · Adam · Multi-Head Attention · Residual Connection · Layer Normalization · Stochastic Depth · Softmax
