NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation

Max Gandyra; Alessandro Santonicola; Michael Beetz

arXiv:2507.01463·cs.CV·December 3, 2025

NOCTIS: Novel Object Cyclic Threshold based Instance Segmentation

Max Gandyra, Alessandro Santonicola, Michael Beetz

PDF

Open Access 1 Repo 4 Reviews

TL;DR

NOCTIS is a training-free instance segmentation framework that combines pre-trained models with a cyclic thresholding mechanism to accurately segment novel objects in RGB images without additional training.

Contribution

It introduces a novel cyclic thresholding method and an RGB-only pipeline that outperform existing RGB and RGB-D methods on unseen object segmentation tasks.

Findings

01

Outperforms state-of-the-art methods on BOP 2023 datasets

02

Does not require further training or fine-tuning

03

Works effectively with only RGB data

Abstract

Instance segmentation of novel objects instances in RGB images, given some example images for each object, is a well known problem in computer vision. Designing a model general enough to be employed for all kinds of novel objects without (re-) training has proven to be a difficult task. To handle this, we present a new training-free framework, called: Novel Object Cyclic Threshold based Instance Segmentation (NOCTIS). NOCTIS integrates two pre-trained models: Grounded-SAM 2 for object proposals with precise bounding boxes and corresponding segmentation masks; and DINOv2 for robust class and patch embeddings, due to its zero-shot capabilities. Internally, the proposal-object matching is realized by determining an object matching score based on the similarity of the class embeddings and the average maximum similarity of the patch embeddings with a new cyclic thresholding (CT) mechanism…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 4Confidence 4

Strengths

1) Proposes a training-free method for novel class instance segmentation. 2) Proposes the cyclic thresholding method to mitigate the multi-to-one matching problem caused by strict matching. 3) Achieves SOTA or SOTA-comparable performance.

Weaknesses

1) What is the relationship between the task defined in this paper and open-set/open vocabulary instance segmentation? The author claims it is infeasible to train an instance segmentor that can cover sufficiently many instances, yet this is precisely what open-set/open vocabulary instance segmentation tasks aim to achieve. These tasks also design their models based on the generalization capabilities of powerful pre-trained models like SAM and DINO. 2) The related work section also lacks a compa

Reviewer 02Rating 4Confidence 2

Strengths

- Performance: This paper successfully integrates multiple methodological approaches, thereby translating the generalization capability of foundation models into State-of-the-Art performance in the BOP 2023 challenge. - Reproducibility: The paper provides a large amount of detailed description about the experimental setup (e.g., software versions used, random seed, hardware configuration, and running time), which is very helpful for ensuring the good reproducibility of the results. - One of the

Weaknesses

1. Lack of Novelty: It seems that the proposed method primarily relies on the integration of minor innovations (e.g., confidence and score aggregation) on top of existing foundation models. It is largely built upon prior works such as CNOS[1] and SAM-6D[2], resulting in limited incremental novelty. 2. Although the Cyclic Thresholding (CT) mechanism is highlighted as a major innovation, its actual performance gain is extremely limited (0.512 to 0.516 in ablation), which is disproportionate to the

Reviewer 03Rating 4Confidence 3

Strengths

1. SOTA on RGB-Only: Achieved SOTA on the BOP benchmark using only RGB images, outperforming methods that rely on RGB-D (depth) data. 2. Novel CT Matching Algorithm: This paper introduced the "Cyclic Thresholding" (CT) mechanism, a new and effective algorithm that addresses DINOv2's matching instability on repetitive textures.

Weaknesses

First, the paper's core premise of "novelty" is questionable. The framework relies heavily on foundation models (GSAM 2 and DINOv2) that were pre-trained on massive datasets. It is highly probable that these models have already "seen" the object categories present in the BOP benchmark. Therefore, the "zero-shot" capability claimed is more a feat of the models' generalization than true segmentation of unseen objects. Second, the innovation is incremental and best described as a clever engineerin

Reviewer 04Rating 4Confidence 3

Strengths

1. The proposed approach is straightforward and not difficult to understand. 2. Experiments on NOCTIS yield impressive results.

Weaknesses

1. The overall contribution, which I believe centers around Eq.4, appears limited for an ICLR submission. 2. The claimed contribution on "removing selection bias" is not supported by experiments. 3. The proposed approach introduces additional parameters such as CT and $w_{appe}$, which adds to the difficulties in parameter tuning for real-world usage. 4. (Minor) It appears to me that the paper's choice of language style is more like a speech than a research paper.

Code & Models

Repositories

code-iai/noctis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Face recognition and analysis · Advanced Image and Video Retrieval Techniques