AlignSAM: Aligning Segment Anything Model to Open Context via   Reinforcement Learning

Duojun Huang; Xinyu Xiong; Jie Ma; Jichang Li; Zequn Jie; Lin Ma,; Guanbin Li

arXiv:2406.00480·cs.CV·June 4, 2024·1 cites

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma,, Guanbin Li

PDF

Open Access 2 Repos

TL;DR

AlignSAM introduces a reinforcement learning-based framework that automatically generates prompts to adapt the Segment Anything Model to diverse open-world segmentation tasks without retraining, improving accuracy and generalization.

Contribution

This paper presents a novel reinforcement learning approach for automatic prompting, enabling SAM to adapt to various tasks while keeping its parameters frozen, which is a significant advancement over existing methods.

Findings

01

AlignSAM outperforms state-of-the-art methods on multiple segmentation benchmarks.

02

The reinforcement learning policy effectively generates prompts that improve segmentation accuracy.

03

The semantic recalibration module enhances handling of explicit and implicit semantics.

Abstract

Powered by massive curated training data, Segment Anything Model (SAM) has demonstrated its impressive generalization capabilities in open-world scenarios with the guidance of prompts. However, the vanilla SAM is class agnostic and heavily relies on user-provided prompts to segment objects of interest. Adapting this method to diverse tasks is crucial for accurate target identification and to avoid suboptimal segmentation results. In this paper, we propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context through reinforcement learning. Anchored by an agent, AlignSAM enables the generality of the SAM model across diverse downstream tasks while keeping its parameters frozen. Specifically, AlignSAM initiates a prompting agent to iteratively refine segmentation predictions by interacting with the foundational model. It integrates a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems

MethodsSegment Anything Model