Effect of Patch Size on Fine-Tuning Vision Transformers in Two-Dimensional and Three-Dimensional Medical Image Classification

Massoud Dehghan; Ramona Woitek; Amirreza Mahbod

arXiv:2602.18614·cs.CV·February 24, 2026

Effect of Patch Size on Fine-Tuning Vision Transformers in Two-Dimensional and Three-Dimensional Medical Image Classification

Massoud Dehghan, Ramona Woitek, Amirreza Mahbod

PDF

Open Access

TL;DR

This study systematically evaluates how different patch sizes in Vision Transformers affect classification accuracy in 2D and 3D medical imaging, revealing smaller patches generally improve performance at higher computational costs.

Contribution

It provides a comprehensive analysis of patch size effects on ViT performance in medical imaging, highlighting the benefits of smaller patches and ensemble strategies.

Findings

01

Smaller patch sizes (1, 2, 4) improve classification accuracy.

02

Performance gains up to 12.78% in 2D and 23.78% in 3D datasets.

03

Ensemble of small patch models further enhances accuracy.

Abstract

Vision Transformers (ViTs) and their variants have become state-of-the-art in many computer vision tasks and are widely used as backbones in large-scale vision and vision-language foundation models. While substantial research has focused on architectural improvements, the impact of patch size, a crucial initial design choice in ViTs, remains underexplored, particularly in medical domains where both two-dimensional (2D) and three-dimensional (3D) imaging modalities exist. In this study, using 12 medical imaging datasets from various imaging modalities (including seven 2D and five 3D datasets), we conduct a thorough evaluation of how different patch sizes affect ViT classification performance. Using a single graphical processing unit (GPU) and a range of patch sizes (1, 2, 4, 7, 14, 28), we fine-tune ViT models and observe consistent improvements in classification performance with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI