TL;DR
SegRAG is a training-free, retrieval-augmented segmentation framework that enhances open-vocabulary models like SAM3 by grounding them with class-specific prompts from a feature bank, improving performance across benchmarks.
Contribution
It introduces a novel retrieval-augmented approach with ICCD and TSG techniques, enabling training-free, zero-shot domain transfer for semantic segmentation.
Findings
Outperforms text-only baseline on four benchmarks, up to +3.92 mIoU on LVIS.
Significantly improves zero-shot domain transfer, raising mean IoU from 25.27 to 59.24.
Ablation studies confirm each component's contribution to overall performance.
Abstract
Open-vocabulary segmentation models such as SAM3 perform well across broad categories via text prompting, yet degrade when target classes are visually underrepresented in pretraining or depart from canonical depictions-limitations text prompts cannot resolve spatially. We present SegRAG, a training-free retrieval-augmented segmentation framework that grounds SAM3 with class-specific point prompts derived from a curated DINOv3 feature bank. Offline, dense patch-level descriptors are extracted from annotated references and filtered by Intra-Class Cohesion Distillation (ICCD), retaining only prototypes that reliably retrieve within-class foreground. At inference, Topographic Similarity Grounding (TSG) computes a cosine-similarity landscape against retrieved prototypes, identifies coherent high-confidence regions via connected-component analysis, and extracts peak locations through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
