AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma,, Guanbin Li

TL;DR
AlignSAM introduces a reinforcement learning-based framework that automatically generates prompts to adapt the Segment Anything Model to diverse open-world segmentation tasks without retraining, improving accuracy and generalization.
Contribution
This paper presents a novel reinforcement learning approach for automatic prompting, enabling SAM to adapt to various tasks while keeping its parameters frozen, which is a significant advancement over existing methods.
Findings
AlignSAM outperforms state-of-the-art methods on multiple segmentation benchmarks.
The reinforcement learning policy effectively generates prompts that improve segmentation accuracy.
The semantic recalibration module enhances handling of explicit and implicit semantics.
Abstract
Powered by massive curated training data, Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts. However, the vanilla SAM is class agnostic and heavily relies on user-provided prompts to segment objects of interest. Adapting this method to diverse tasks is crucial for accurate target identification and to avoid suboptimal segmentation results. In this paper, we propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context through reinforcement learning. Anchored by an agent, AlignSAM enables the generality of the SAM model across diverse downstream tasks while keeping its parameters frozen. Specifically, AlignSAM initiates a prompting agent to iteratively refine segmentation predictions by interacting with the foundational model. It integrates a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems
MethodsSegment Anything Model
