High-Quality Entity Segmentation
Lu Qi, Jason Kuen, Weidong Guo, Tiancheng Shen, Jiuxiang Gu, Jiaya, Jia, Zhe Lin, Ming-Hsuan Yang

TL;DR
This paper introduces a new high-quality entity segmentation dataset and a novel query-based Transformer method, CropFormer, that effectively fuses multi-view image information to improve dense segmentation accuracy in diverse, high-resolution images.
Contribution
The paper presents a new dataset focused on high-quality dense segmentation in the wild and a novel Transformer architecture, CropFormer, for improved multi-view mask fusion in high-resolution images.
Findings
CropFormer achieves a 1.9 AP improvement on entity segmentation.
The dataset enables better generalization across diverse domains.
CropFormer enhances traditional segmentation tasks.
Abstract
Dense image segmentation tasks e.g., semantic, panoptic) are useful for image editing, but existing methods can hardly generalize well in an in-the-wild setting where there are unrestricted image domains, classes, and image resolution and quality variations. Motivated by these observations, we construct a new entity segmentation dataset, with a strong focus on high-quality dense segmentation in the wild. The dataset contains images spanning diverse image domains and entities, along with plentiful high-resolution images and high-quality mask annotations for training and testing. Given the high-quality and -resolution nature of the dataset, we propose CropFormer which is designed to tackle the intractability of instance-level segmentation on high-resolution images. It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Position-Wise Feed-Forward Layer · Linear Layer · Label Smoothing · Softmax · Adam · Absolute Position Encodings · Layer Normalization
