Adapting Pre-Trained Vision Models for Novel Instance Detection and   Segmentation

Yangxiao Lu; Jishnu Jaykumar P; Yunhui Guo; Nicholas Ruozzi; and Yu; Xiang

arXiv:2405.17859·cs.CV·March 6, 2025

Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation

Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, and Yu, Xiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces NIDS-Net, a unified framework that leverages large vision models and a novel weight adapter to improve novel instance detection and segmentation, achieving state-of-the-art results across multiple datasets.

Contribution

The paper presents a new framework combining large vision models with a weight adapter for high-quality embeddings, enhancing few-shot novel instance detection and segmentation performance.

Findings

01

Outperforms current state-of-the-art methods on four detection datasets.

02

Achieves superior segmentation results on BOP challenge datasets.

03

Demonstrates effectiveness on real-world images from robots and cameras.

Abstract

Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified, simple, yet effective framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilized foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

youngsean/nids-net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Image Processing and 3D Reconstruction · Image and Object Detection Techniques

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Vision Transformer · Adapter · self-DIstillation with NO labels