PolyFormer: Referring Image Segmentation as Sequential Polygon   Generation

Jiang Liu; Hui Ding; Zhaowei Cai; Yuting Zhang; Ravi Kumar Satzoda,; Vijay Mahadevan; R. Manmatha

arXiv:2302.07387·cs.CV·March 29, 2023·6 cites

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Jiang Liu, Hui Ding, Zhaowei Cai, Yuting Zhang, Ravi Kumar Satzoda,, Vijay Mahadevan, R. Manmatha

PDF

Open Access 1 Repo 2 Models

TL;DR

PolyFormer introduces a novel sequence-to-sequence polygon generation framework for referring image segmentation, outperforming previous methods and demonstrating strong generalization to video segmentation tasks.

Contribution

The paper proposes a new Polygon Transformer framework that predicts polygons directly, improving geometric localization and segmentation accuracy over existing pixel-based methods.

Findings

01

Outperforms prior art on RefCOCO+ and RefCOCOg datasets

02

Achieves 61.5% J&F on Ref-DAVIS17 without fine-tuning

03

Uses a regression-based decoder for precise coordinate prediction

Abstract

In this work, instead of directly predicting the pixel-level segmentation masks, the problem of referring image segmentation is formulated as sequential polygon generation, and the predicted polygons can be later converted into segmentation masks. This is enabled by a new sequence-to-sequence framework, Polygon Transformer (PolyFormer), which takes a sequence of image patches and text query tokens as input, and outputs a sequence of polygon vertices autoregressively. For more accurate geometric localization, we propose a regression-based decoder, which predicts the precise floating-point coordinates directly, without any coordinate quantization error. In the experiments, PolyFormer outperforms the prior art by a clear margin, e.g., 5.40% and 4.52% absolute improvements on the challenging RefCOCO+ and RefCOCOg datasets. It also shows strong generalization ability when evaluated on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/polygon-transformer
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Linear Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer · Softmax