PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative   Grounding

Zihan Ding; Zi-han Ding; Tianrui Hui; Junshi Huang; Xiaoming Wei,; Xiaolin Wei; Si Liu

arXiv:2208.05647·cs.CV·August 12, 2022

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

Zihan Ding, Zi-han Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei,, Xiaolin Wei, Si Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces PPMN, a one-stage end-to-end network for Panoptic Narrative Grounding that directly matches phrases to pixels, improving accuracy over previous two-stage methods.

Contribution

The paper proposes a novel one-stage Pixel-Phrase Matching Network with a Language-Compatible Pixel Aggregation module for better phrase-to-pixel matching in PNG tasks.

Findings

01

Achieves state-of-the-art performance on PNG benchmark

02

Outperforms two-stage methods by 4.0 absolute Average Recall

03

Demonstrates effective pixel-phrase correspondence modeling

Abstract

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image. The previous two-stage approach first extracts segmentation region proposals by an off-the-shelf panoptic segmentation model, then conducts coarse region-phrase matching to ground the candidate regions for each noun phrase. However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage and the loss of spatial details caused by region feature pooling, as well as complicated strategies designed for things and stuff categories separately. To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dzh19990407/ppmn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization