TL;DR
This paper introduces a method to select the optimal input patch size for vision transformers in tumor segmentation, improving accuracy especially for small lesions by using a volume-based approach and transfer learning.
Contribution
It proposes a novel technique to determine the best patch size based on lesion volume, enhancing vision transformer performance in medical image segmentation.
Findings
Optimal patch size improves segmentation accuracy for small tumors.
Transfer learning with larger tumor volumes enhances performance on smaller lesions.
The method demonstrates consistent improvements on multi-resolution datasets.
Abstract
Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN's locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
