From Local Matches to Global Masks: Template-Guided Instance Detection and Segmentation in Open-World Scenes

Qifan Zhang; Sai Haneesh Allu; Jikai Wang; Yangxiao Lu; Yu Xiang

arXiv:2603.03577·cs.CV·May 15, 2026

From Local Matches to Global Masks: Template-Guided Instance Detection and Segmentation in Open-World Scenes

Qifan Zhang, Sai Haneesh Allu, Jikai Wang, Yangxiao Lu, Yu Xiang

PDF

TL;DR

This paper introduces L2G-Det, a novel detection framework that uses dense patch matching and a segmentation model to detect and segment novel objects in cluttered, unseen scenes without relying on proposals.

Contribution

The paper presents a local-to-global detection approach that bypasses proposals and integrates with SAM for accurate instance segmentation in open-world environments.

Findings

01

L2G-Det outperforms proposal-based methods in open-world detection tasks.

02

Dense patch matching effectively locates objects under occlusion and clutter.

03

The method reliably reconstructs complete object masks in challenging scenes.

Abstract

Detecting and segmenting novel object instances in open-world environments is a fundamental problem in robotic perception. Given only a small set of template images, a robot must locate and segment a specific object instance in a cluttered, previously unseen scene. Existing proposal-based approaches are highly sensitive to proposal quality and often fail under occlusion and background clutter. We propose L2G-Det, a local-to-global instance detection framework that bypasses explicit object proposals by leveraging dense patch-level matching between templates and the query image. Locally matched patches generate candidate points, which are refined through a candidate selection module to suppress false positives. The filtered points are then used to prompt an augmented Segment Anything Model (SAM) with instance-specific object tokens, enabling reliable reconstruction of complete instance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.