Med-TTT: Vision Test-Time Training model for Medical Image Segmentation
Jiashu Xu

TL;DR
Med-TTT introduces a novel vision test-time training model with dynamic, multi-resolution, and frequency domain enhancements, significantly improving medical image segmentation accuracy and robustness in complex backgrounds.
Contribution
The paper presents Med-TTT, a new model integrating Vision-TTT layers, multi-resolution fusion, and frequency domain strategies for improved medical image segmentation.
Findings
Outperforms existing methods on multiple datasets
Achieves higher accuracy, sensitivity, and Dice coefficient
Effective in complex background segmentation
Abstract
Medical image segmentation plays a crucial role in clinical diagnosis and treatment planning. Although models based on convolutional neural networks (CNNs) and Transformers have achieved remarkable success in medical image segmentation tasks, they still face challenges such as high computational complexity and the loss of local features when capturing long-range dependencies. To address these limitations, we propose Med-TTT, a visual backbone network integrated with Test-Time Training (TTT) layers, which incorporates dynamic adjustment capabilities. Med-TTT introduces the Vision-TTT layer, which enables effective modeling of long-range dependencies with linear computational complexity and adaptive parameter adjustment during inference. Furthermore, we designed a multi-resolution fusion mechanism to combine image features at different scales, facilitating the identification of subtle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging
