Order-aware Interactive Segmentation
Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin, Planche, Andong Deng, Qin Liu, Terrence Chen, Ulas Bagci, Ziyan Wu

TL;DR
This paper introduces OIS, an order-aware interactive segmentation method that encodes relative depth between objects to improve segmentation accuracy and efficiency, achieving state-of-the-art results and faster inference.
Contribution
The paper proposes a novel order-aware attention mechanism and object-aware modules that incorporate depth order information into interactive segmentation.
Findings
Achieves 7.61% mIoU improvement on HQSeg44K after one click.
Doubles inference speed compared to previous methods.
Outperforms prior state-of-the-art in accuracy and efficiency.
Abstract
Interactive segmentation aims to accurately segment target objects with minimal user interactions. However, current methods often fail to accurately separate target objects from the background, due to a limited understanding of order, the relative depth between objects in a scene. To address this issue, we propose OIS: order-aware interactive segmentation, where we explicitly encode the relative depth between objects into order maps. We introduce a novel order-aware attention, where the order maps seamlessly guide the user interactions (in the form of clicks) to attend to the image features. We further present an object-aware attention module to incorporate a strong object-level understanding to better differentiate objects with similar order. Our approach allows both dense and sparse integration of user clicks, enhancing both accuracy and efficiency as compared to prior works.…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper is well-written, with clear motivation. 2. It provides extensive experiments that convincingly demonstrate the proposed method's effectiveness. 3. The visualized analysis adds value by highlighting the necessity of incorporating depth information. 4. The method's performance surpasses current state-of-the-art methods.
1. The paper exceeds the ICLR 2025 10-page limit with 11 pages in the main text. 2. Overall, the technical contribution and novelty of this paper are incremental, as it mainly incorporates existing priors, such as depth maps and foreground-background masks, to enhance segmentation accuracy. Since these priors have already proven effective in general segmentation tasks, their success in interactive segmentation is unsurprising. I would encourage the authors to clarify the unique benefits these pr
1. The paper has a good writing and structure, where the paper ideas and figures are easy to read and understand. 2. The paper validates its model design choice in Sec 4.5 and Table 4, which shows the performance gain brought by each proposed component clearly. 3. Using relative depth order to guide segmentation is an interesting idea, where the proposed order map considers both the positive and negative clicks.
1. The paper utilizes additional monocular depth prediction as input. In Table 3, does the computing cost of the depth model also get included? The paper should also study the impact of using various depth prediction models to the segmentation performance. 2. The robustness of the segmentation model to the errors brought by depth prediction network is not studied. When depth prediction makes large errors, how will it influence the segmentation model? Especially, when segmenting neighboring obje
1. The method looks very good for segmenting some difficult foreground objects (tennis rackets, bicycle wheels, etc.). 2. The order-aware attention module introduced in this paper is easy to accept and effective. 3. The framework of this paper is relatively concise and the implementation is easy to understand.
1. The method section of this paper is relatively easy to understand, but the motivation and problems to be solved are not concise enough. The author seems to want to solve many problems and propose many improvements, which can easily lead to readers not understanding why the author proposes certain techniques. 2. The experiments of this paper is insufficient, lacking results on typical interactive segmentation datasets, such as GrabCut, Berkeley, SBD, PascalVOC, etc. In addition, there is a lac
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Constraint Satisfaction and Optimization · Video Analysis and Summarization
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
