AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

Rui Qian; Chuanhang Deng; Qiang Huang; Jian Xiong; Mingxuan Li; Yingbo Zhou; Wei Zhai; Jintao Chen; Dejing Dou

arXiv:2604.18562·cs.CV·April 23, 2026

AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

Rui Qian, Chuanhang Deng, Qiang Huang, Jian Xiong, Mingxuan Li, Yingbo Zhou, Wei Zhai, Jintao Chen, Dejing Dou

PDF

1 Repo

TL;DR

AnchorSeg introduces a structured approach to reasoning segmentation by using language grounded query banks, explicitly disentangling semantic reasoning from spatial localization, leading to improved pixel-level segmentation accuracy.

Contribution

It reformulates reasoning segmentation as a structured conditional generation with explicit spatial grounding, introducing query banks and a novel training objective for better alignment.

Findings

01

Achieves state-of-the-art results on ReasonSeg with 67.7% gIoU.

02

Uses explicit language grounded query banks for better reasoning and localization.

03

Proposes Token–Mask Cycle Consistency for improved training alignment.

Abstract

Reasoning segmentation requires models to ground complex, implicit textual queries into precise pixel-level masks. Existing approaches rely on a single segmentation token $<SEG>$ , whose hidden state implicitly encodes both semantic reasoning and spatial localization, limiting the model's ability to explicitly disentangle what to segment from where to segment. We introduce AnchorSeg, which reformulates reasoning segmentation as a structured conditional generation process over image tokens, conditioned on language grounded query banks. Instead of compressing all semantic reasoning and spatial localization into a single embedding, AnchorSeg constructs an ordered sequence of query banks: latent reasoning tokens that capture intermediate semantic states, and a segmentation anchor token that provides explicit spatial grounding. We model spatial conditioning as a factorized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rui-qian/AnchorSeg
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.