SOHES: Self-supervised Open-world Hierarchical Entity Segmentation
Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

TL;DR
SOHES introduces a self-supervised, hierarchical entity segmentation method that eliminates the need for human annotations, achieving high performance in open-world image segmentation by leveraging pseudo-labels and mutual learning.
Contribution
It presents a novel self-supervised approach for open-world entity segmentation that captures hierarchical structures without human annotations.
Findings
Achieves state-of-the-art performance in self-supervised open-world segmentation.
Effectively captures hierarchical structures of entities and parts.
Eliminates reliance on costly human annotations.
Abstract
Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its promise, existing entity segmentation methods like Segment Anything Model (SAM) rely heavily on costly expert annotators. This work presents Self-supervised Open-world Hierarchical Entity Segmentation (SOHES), a novel approach that eliminates the need for human annotations. SOHES operates in three phases: self-exploration, self-instruction, and self-correction. Given a pre-trained self-supervised representation, we produce abundant high-quality pseudo-labels through visual feature clustering. Then, we train a segmentation model on the pseudo-labels, and rectify the noises in pseudo-labels via a teacher-student mutual-learning procedure. Beyond…
Peer Reviews
Decision·ICLR 2024 poster
[Task] Unsupervised image segmentation holds significant importance, and this study successfully performs segmentation without human supervision, offering segmentation masks at multiple levels of granularity. [The generation of hierarchical masks] The approach to generate unsupervised hierarchical masks is pretty interesting. And surprisingly, this method surpassed SAM in recall on some evaluation benchmarks. [Paper writing] The paper is well-articulated, effectively communicating the central
[Technical Contributions] The three phases proposed in this work are very similar to the Cut-and-Learn pipeline proposed by CutLER [1]. Self-exploration is pretty similar to the MaskCut stage in CutLER, which also leverages DINO feature for pseudo-label generation. Self-instruction is the same as the LEARN process of CutLER, which trains a model on pseudo-labels. And, the Self-correction stage can be viewed as a variant of CutLER's multi-round self-training, but with a teacher-student framework.
1. The task SOHES is seldom investigated. 2. This paper proposed a new method to generate hierarchical proposals. 3. This paper proposed an ancestor prediction head, which is novel. 4. The proposed method significantly outperformed the previous methods.
1. Although this paper divide the stages into self-exploration, self-instruction, and self-correction. But it looks like previous papers[1] that generate pseudo-labels, then training from pseudo-labels, and apply self-training to improve the model. The framework is actually quite common. So what is the core difference from the previous works in the framework? 2. The authors claimed that “Existing segmentation models cannot predict the hierarchical relations among masks. ” However, methods li
1. The proposed method is highly motivated, and the results of SOHES demonstrate tremendous potential in open-world entity segmentation, even surpassing SAM's performance on certain datasets. 2. The paper is excellently structured and provides a clear and easy-to-follow presentation.
1. The analysis of the hierarchical architecture is inadequate. 2. Certain details in the method lack clarity, and there is a noticeable absence of some ablation experiments.
Videos
Taxonomy
TopicsWeb Data Mining and Analysis · Data Quality and Management · Data Mining Algorithms and Applications
