Enhancing Object Discovery for Unsupervised Instance Segmentation and Object Detection
Xingyu Feng, Hebei Gao, Hong Li

TL;DR
COLER introduces a novel, simple, and effective unsupervised approach for object discovery that leverages a single normalized cut step and self-supervised learning, outperforming previous methods on multiple benchmarks.
Contribution
The paper presents COLER, a zero-shot unsupervised model that uses a new CutOnce method with a single NCut application, eliminating the need for clustering or mask post-processing.
Findings
Outperforms previous state-of-the-art on multiple benchmarks.
Achieves strong performance without specialized loss functions.
Leverages self-supervised models for improved object discovery.
Abstract
We propose Cut-Once-and-LEaRn (COLER), a simple approach for unsupervised instance segmentation and object detection. COLER first uses our developed CutOnce to generate coarse pseudo labels, then enables the detector to learn from these masks. CutOnce applies Normalized Cut (NCut) only once and does not rely on any clustering methods (e.g., K-Means), but it can generate multiple object masks in an image. Our work opens a new direction for NCut algorithm in multi-object segmentation. We have designed several novel yet simple modules that not only allow CutOnce to fully leverage the object discovery capabilities of self-supervised model, but also free it from reliance on mask post-processing. During training, COLER achieves strong performance without requiring specially designed loss functions for pseudo labels, and its performance is further improved through self-training. COLER is a…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The presentation is clear and logical, effectively demonstrating the overall pipeline design with well-structured experiments. 2. The experimental results demonstrate strong performance, achieving favorable metrics when compared directly against the established baseline methods, suggesting the efficacy and competitive advantage of the proposed approach.
1. The overall pipeline lacks significant novelty, bearing a strong resemblance to existing unsupervised methods. Although the authors introduce several techniques to enhance pseudo label quality, the specific mechanisms—such as the methods used to refine the pseudo segmentation map and the filtering strategies—do not demonstrate a substantial departure from classical computer vision approaches. 2. The performance improvements achieved by the proposed method are marginal when compared to the ex
- The method is technically sound and implementation-oriented: the density-tuned similarity (adaptive temperature), boundary-augmented eigenvector, and rank-based component selection are specified mathematically and ablated individually and cumulatively. - The pipeline figure and intermediate visualizations (raw/“boundary”/difference eigenvectors, component maps) explain why single-pass NCut can still separate multiple instances. And, the writing is direct; notation for the three modules is com
- The three modules—adaptive temperature on cosine affinities, boundary emphasizing via neighborhood differencing, and rank filtering—are spectral pre/post-processing heuristics layered on a classical NCut pipeline (no new learning principle or theory). In contrast, DiffCut offers a more substantive change of backbone (diffusion UNet features) *and* a recursive NCut with granularity control; DiffNCut explores differentiability for end-to-end learning. The paper’s “SOTA” claim should be carefully
The presentation of this paper is clear. Writing logic is easy to follow, and illustrations are helpful for understanding.
1. The proposed method cannot fundamentally solve the issues mentioned in the paper (i.e. more accurate multi-object segmentation): - As long as pseudo-labels are directly generated by N-Cut self-supervised features and the SSL features do not reveal real objectness, heuristic tricks like boundary processing, connected-component analysis can only receive incremental improvement, but not fundamentally solve the problem. 2. The designs of some modules are not well justified: - First 2 modules in
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
