SOLOv2: Dynamic and Fast Instance Segmentation
Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, Chunhua Shen

TL;DR
SOLOv2 introduces a fast, dynamic instance segmentation framework that conditions mask heads on object locations and employs Matrix NMS for efficient, accurate object detection and segmentation.
Contribution
It proposes a novel dynamic mask head conditioned on object location and a new Matrix NMS method, improving speed and accuracy over previous methods.
Findings
Achieves 37.1% AP at 31.3 FPS with a lightweight model.
Outperforms state-of-the-art methods in speed and accuracy.
Demonstrates strong results in object detection and panoptic segmentation.
Abstract
In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Matrix Non-Maximum Suppression · Convolution
