PanSR: An Object-Centric Mask Transformer for Panoptic Segmentation
Lojze \v{Z}ust, Matej Kristan

TL;DR
PanSR introduces an object-centric mask transformer that improves small object detection and scene segmentation in crowded scenes, achieving state-of-the-art results on multiple benchmarks.
Contribution
It proposes a novel panoptic segmentation method that addresses key shortcomings of existing mask-transformer approaches, notably enhancing small object detection and reducing instance merging.
Findings
+3.4 PQ improvement on LaRS benchmark
State-of-the-art performance on Cityscapes
Effective mitigation of instance merging and small-object detection
Abstract
Panoptic segmentation is a fundamental task in computer vision and a crucial component for perception in autonomous vehicles. Recent mask-transformer-based methods achieve impressive performance on standard benchmarks but face significant challenges with small objects, crowded scenes and scenes exhibiting a wide range of object scales. We identify several fundamental shortcomings of the current approaches: (i) the query proposal generation process is biased towards larger objects, resulting in missed smaller objects, (ii) initially well-localized queries may drift to other objects, resulting in missed detections, (iii) spatially well-separated instances may be merged into a single mask causing inconsistent and false scene interpretations. To address these issues, we rethink the individual components of the network and its supervision, and propose a novel method for panoptic segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques · Image and Object Detection Techniques
