Panoptic Scene Graph Generation

Jingkang Yang; Yi Zhe Ang; Zujin Guo; Kaiyang Zhou; Wayne Zhang; and; Ziwei Liu

arXiv:2207.11247·cs.CV·July 25, 2022

Panoptic Scene Graph Generation

Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, and, Ziwei Liu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces panoptic scene graph generation (PSG), a new approach that uses panoptic segmentation instead of bounding boxes to create more comprehensive scene graphs, along with a new dataset and baseline models.

Contribution

The paper proposes PSG as a novel task, creates a high-quality dataset, and develops baseline models including Transformer-based methods for improved scene understanding.

Findings

01

PSG dataset contains 49k annotated images from COCO and Visual Genome.

02

Two Transformer-based models, PSGTR and PSGFormer, outperform traditional baselines.

03

PSG enables more detailed and context-aware scene graph generation.

Abstract

Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i.e., objects are detected using bounding boxes followed by prediction of their pairwise relationships. We argue that such a paradigm causes several problems that impede the progress of the field. For instance, bounding box-based labels in current datasets usually contain redundant classes like hairs, and leave out background information that is crucial to the understanding of context. In this work, we introduce panoptic scene graph generation (PSG), a new problem task that requires the model to generate a more comprehensive scene graph representation based on panoptic segmentations rather than rigid bounding boxes. A high-quality PSG dataset, which contains 49k well-annotated overlapping images from COCO and Visual Genome, is created for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jingkang50/OpenPSG
pytorchOfficial

Datasets

maelic/PSG-coco-format
dataset· 336 dl
336 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Softmax · Byte Pair Encoding · Adam