BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
Yiran Song, Qianyu Zhou, Xiangtai Li, Deng-Ping Fan, Xuequan Lu,, Lizhuang Ma

TL;DR
BA-SAM introduces a scalable attention mask that adapts to varying image resolutions in the Segment Anything Model, improving zero-shot performance and reducing the need for costly retraining.
Contribution
The paper proposes a novel scaling factor and bias-mode attention mask to enhance SAM's adaptability to different image sizes without structural modifications.
Findings
Significantly reduces performance degradation in zero-shot settings.
Achieves state-of-the-art results with minimal fine-tuning.
Demonstrates strong generalizability across multiple datasets.
Abstract
In this paper, we address the challenge of image resolution variation for the Segment Anything Model (SAM). SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes. Previous approaches tend to resize the image to a fixed size or adopt structure modifications, hindering the preservation of SAM's rich prior knowledge. Besides, such task-specific tuning necessitates a complete retraining of the model, which is cost-expensive and unacceptable for deployment in the downstream tasks. In this paper, we reformulate this issue as a length extrapolation problem, where token sequence length varies while maintaining a consistent patch size for images of different sizes. To this end, we propose Scalable Bias-Mode Attention Mask (BA-SAM) to enhance SAM's adaptability to varying image resolutions while eliminating the need for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Anomaly Detection Techniques and Applications
MethodsSegment Anything Model
