SOLQ: Segmenting Objects by Learning Queries
Bin Dong, Fangao Zeng, Tiancai Wang, Xiangyu Zhang, Yichen Wei

TL;DR
SOLQ introduces a novel end-to-end transformer-based instance segmentation framework that learns unified object queries representing class, location, and mask, achieving state-of-the-art results and improving detection performance.
Contribution
The paper presents SOLQ, a new method that learns unified queries for object classification, localization, and mask encoding, enhancing DETR's performance in instance segmentation.
Findings
Achieves state-of-the-art segmentation performance.
Joint query learning improves detection accuracy.
Direct transformation of mask vectors to spatial masks.
Abstract
In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR [1], our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks. In inference time, mask vectors produced can be directly transformed to spatial masks by the inverse process of compression coding. Experimental results show that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Moreover, the joint learning of unified query representation can greatly improve the detection performance of DETR. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Image and Object Detection Techniques
MethodsSoftmax · Convolution · Dense Connections · Feedforward Network · Detection Transformer
