Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou, Zhongwei Qiu, Dongmei Fu

TL;DR
This paper introduces MCA-SAM, a novel contrastive learning framework with multi-scale adaptors that significantly improves the performance of the Segment Anything Model in specialized segmentation tasks with limited data.
Contribution
The paper proposes a new multi-scale contrastive adaptor learning method, MCA-SAM, which enhances SAM's adaptability and performance in challenging segmentation domains.
Findings
Outperforms existing methods in camouflage object detection, shadow segmentation, and polyp segmentation.
Achieves 20.0% MAE improvement on COD10K dataset.
Achieves 7.9% mDice improvement on Kvasir-SEG dataset.
Abstract
Foundational vision models, such as the Segment Anything Model (SAM), have achieved significant breakthroughs through extensive pre-training on large-scale visual datasets. Despite their general success, these models may fall short in specialized tasks with limited data, and fine-tuning such large-scale models is often not feasible. Current strategies involve incorporating adaptors into the pre-trained SAM to facilitate downstream task performance with minimal model adjustment. However, these strategies can be hampered by suboptimal learning approaches for the adaptors. In this paper, we introduce a novel Multi-scale Contrastive Adaptor learning method named MCA-SAM, which enhances adaptor performance through a meticulously designed contrastive learning framework at both token and sample levels. Our Token-level Contrastive adaptor (TC-adaptor) focuses on refining local representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications · Human Pose and Action Recognition
MethodsSegment Anything Model · Contrastive Learning · Masked autoencoder
