CATS v2: Hybrid encoders for robust medical segmentation
Hao Li, Han Liu, Dewei Hu, Xing Yao, Jiacheng Wang, Ipek Oguz

TL;DR
CATS v2 introduces a hybrid encoder architecture combining CNN and transformer paths with shifted window mechanisms, enhancing 3D medical image segmentation by leveraging local and global features for improved accuracy.
Contribution
This work extends the previous CATS model by integrating hybrid encoders with CNN and transformer components, improving segmentation robustness and accuracy across multiple datasets.
Findings
Outperforms state-of-the-art methods in Dice scores
Effective fusion of local and global features
Demonstrates robustness across diverse datasets
Abstract
Convolutional Neural Networks (CNNs) have exhibited strong performance in medical image segmentation tasks by capturing high-level (local) information, such as edges and textures. However, due to the limited field of view of convolution kernel, it is hard for CNNs to fully represent global information. Recently, transformers have shown good performance for medical image segmentation due to their ability to better model long-range dependencies. Nevertheless, transformers struggle to capture high-level spatial features as effectively as CNNs. A good segmentation model should learn a better representation from local and global features to be both precise and semantically accurate. In our previous work, we proposed CATS, which is a U-shaped segmentation network augmented with transformer encoder. In this work, we further extend this model and propose CATS v2 with hybrid encoders.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging and Analysis · Advanced Neural Network Applications · Artificial Intelligence in Healthcare and Education
MethodsConvolution
