SAM3-I: Segment Anything with Instructions

Jingjing Li; Yue Feng; Yuchen Guo; Jincai Huang; Wei Ji; Qi Bi; Yongri Piao; Miao Zhang; Xiaoqi Zhao; Qiang Chen; Shihao Zou; Huchuan Lu; Li Cheng

arXiv:2512.04585·cs.CV·April 17, 2026

SAM3-I: Segment Anything with Instructions

Jingjing Li, Yue Feng, Yuchen Guo, Jincai Huang, Wei Ji, Qi Bi, Yongri Piao, Miao Zhang, Xiaoqi Zhao, Qiang Chen, Shihao Zou, Huchuan Lu, Li Cheng

PDF

1 Repo

TL;DR

SAM3-I extends the Segment Anything Model to interpret complex natural-language instructions for segmentation, integrating concept grounding and instruction reasoning within a unified framework.

Contribution

It introduces SAM3-I, a unified segmentation framework that directly interprets natural-language instructions, and HMPL-Instruct, a large-scale instruction-centric dataset for training.

Findings

01

SAM3-I achieves strong performance in instruction-following segmentation tasks.

02

The model effectively combines concept grounding with complex instruction understanding.

03

Code and dataset are publicly available at the provided GitHub link.

Abstract

Segment Anything Model 3 (SAM3) advances open-vocabulary segmentation through promptable concept segmentation, enabling users to segment all instances associated with a given concept using short noun-phrase (NP) prompts. While effective for concept-level grounding, real-world interactions often involve far richer natural-language instructions that combine attributes, relations, actions, states, or implicit reasoning. Currently, SAM3 relies on external multi-modal agents to convert complex instructions into NPs and conducts iterative mask filtering, leading to coarse representations and limited instance specificity. In this work, we present SAM3-I, an instruction-following extension of the SAM family that unifies concept-level grounding and instruction-level reasoning within a single segmentation framework. Built upon SAM3, SAM3-I introduces an instruction-aware cascaded adaptation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

debby-0527/SAM3-I
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.