Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes
Sota Kato, Hinako Mitsuoka, Kazuhiro Hotta

TL;DR
This paper introduces Generalized SAM (GSAM), a novel fine-tuning method that enables variable input image sizes for the Segment Anything Model, reducing computational costs while maintaining or improving accuracy.
Contribution
GSAM is the first approach to apply random cropping during SAM fine-tuning, allowing variable input sizes and more efficient training.
Findings
GSAM trains more efficiently than existing methods.
GSAM achieves comparable or higher accuracy across various datasets.
Random cropping reduces computational demands during training.
Abstract
There has been a lot of recent research on improving the efficiency of fine-tuning foundation models. In this paper, we propose a novel efficient fine-tuning method that allows the input image size of Segment Anything Model (SAM) to be variable. SAM is a powerful foundational model for image segmentation trained on huge datasets, but it requires fine-tuning to recognize arbitrary classes. The input image size of SAM is fixed at 1024 x 1024, resulting in substantial computational demands during training. Furthermore, the fixed input image size may result in the loss of image information, e.g. due to fixed aspect ratios. To address this problem, we propose Generalized SAM (GSAM). Different from the previous methods, GSAM is the first to apply random cropping during training with SAM, thereby significantly reducing the computational cost of training. Experiments on datasets of various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
