Panoptic Scene Graph Generation
Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, and, Ziwei Liu

TL;DR
This paper introduces panoptic scene graph generation (PSG), a new approach that uses panoptic segmentation instead of bounding boxes to create more comprehensive scene graphs, along with a new dataset and baseline models.
Contribution
The paper proposes PSG as a novel task, creates a high-quality dataset, and develops baseline models including Transformer-based methods for improved scene understanding.
Findings
PSG dataset contains 49k annotated images from COCO and Visual Genome.
Two Transformer-based models, PSGTR and PSGFormer, outperform traditional baselines.
PSG enables more detailed and context-aware scene graph generation.
Abstract
Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i.e., objects are detected using bounding boxes followed by prediction of their pairwise relationships. We argue that such a paradigm causes several problems that impede the progress of the field. For instance, bounding box-based labels in current datasets usually contain redundant classes like hairs, and leave out background information that is crucial to the understanding of context. In this work, we introduce panoptic scene graph generation (PSG), a new problem task that requires the model to generate a more comprehensive scene graph representation based on panoptic segmentations rather than rigid bounding boxes. A high-quality PSG dataset, which contains 49k well-annotated overlapping images from COCO and Visual Genome, is created for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsAttention Is All You Need · Linear Layer · Dropout · Multi-Head Attention · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Softmax · Byte Pair Encoding · Adam
