Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS   Instance Segmentation

Chengxi Zeng; Xinyu Yang; Majid Mirmehdi; Alberto M Gambaruto; Tilo; Burghardt

arXiv:2208.08315·eess.IV·August 23, 2022·1 cites

Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation

Chengxi Zeng, Xinyu Yang, Majid Mirmehdi, Alberto M Gambaruto, Tilo, Burghardt

PDF

Open Access 2 Repos

TL;DR

Video-TransUNet is a novel deep learning architecture that combines CNNs, transformers, and temporal feature blending to improve instance segmentation accuracy in medical CT videos, specifically for swallowing studies.

Contribution

The paper introduces Video-TransUNet, integrating temporal feature blending into TransUNet for enhanced segmentation in medical videos, outperforming existing methods.

Findings

01

Achieves a dice coefficient of 0.8796 on VFSS2022 dataset.

02

Significantly outperforms state-of-the-art segmentation systems.

03

Provides open-source code and annotations for reproducibility.

Abstract

We propose Video-TransUNet, a deep architecture for instance segmentation in medical CT videos constructed by integrating temporal feature blending into the TransUNet deep learning framework. In particular, our approach amalgamates strong frame representation via a ResNet CNN backbone, multi-frame feature blending via a Temporal Context Module (TCM), non-local attention via a Vision Transformer, and reconstructive capabilities for multiple targets via a UNet-based convolutional-deconvolutional architecture with multiple heads. We show that this new network design can significantly outperform other state-of-the-art systems when tested on the segmentation of bolus and pharynx/larynx in Videofluoroscopic Swallowing Study (VFSS) CT sequences. On our VFSS2022 dataset it achieves a dice coefficient of 0.8796 and an average surface distance of 1.0379 pixels. Note that tracking the pharyngeal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDysphagia Assessment and Management · Tracheal and airway disorders · Voice and Speech Disorders

MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · 1x1 Convolution · Batch Normalization · Label Smoothing · Bottleneck Residual Block · Position-Wise Feed-Forward Layer · Residual Connection · Adam