Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation
Yangxiao Lu, Jishnu Jaykumar P, Yunhui Guo, Nicholas Ruozzi, and Yu, Xiang

TL;DR
This paper introduces NIDS-Net, a unified framework that leverages large vision models and a novel weight adapter to improve novel instance detection and segmentation, achieving state-of-the-art results across multiple datasets.
Contribution
The paper presents a new framework combining large vision models with a weight adapter for high-quality embeddings, enhancing few-shot novel instance detection and segmentation performance.
Findings
Outperforms current state-of-the-art methods on four detection datasets.
Achieves superior segmentation results on BOP challenge datasets.
Demonstrates effectiveness on real-world images from robots and cameras.
Abstract
Novel Instance Detection and Segmentation (NIDS) aims at detecting and segmenting novel object instances given a few examples of each instance. We propose a unified, simple, yet effective framework (NIDS-Net) comprising object proposal generation, embedding creation for both instance templates and proposal regions, and embedding matching for instance label assignment. Leveraging recent advancements in large vision methods, we utilize Grounding DINO and Segment Anything Model (SAM) to obtain object proposals with accurate bounding boxes and masks. Central to our approach is the generation of high-quality instance embeddings. We utilized foreground feature averages of patch embeddings from the DINOv2 ViT backbone, followed by refinement through a weight adapter mechanism that we introduce. We show experimentally that our weight adapter can adjust the embeddings locally within their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Image Processing and 3D Reconstruction · Image and Object Detection Techniques
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Residual Connection · Multi-Head Attention · Dense Connections · Vision Transformer · Adapter · self-DIstillation with NO labels
