OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal   Models

Zijian Zhou; Zheng Zhu; Holger Caesar; Miaojing Shi

arXiv:2407.11213·cs.CV·July 17, 2024

OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models

Zijian Zhou, Zheng Zhu, Holger Caesar, Miaojing Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces OpenPSG, a novel approach leveraging large multimodal models for open-set panoptic scene graph generation, enabling relation prediction beyond predefined categories in complex images.

Contribution

It proposes the first open-set PSG task and develops a relation query transformer with autoregressive generation for open-set relation prediction.

Findings

01

Achieves state-of-the-art performance in open-set relation prediction.

02

Effectively filters irrelevant object pairs to improve prediction efficiency.

03

Demonstrates the feasibility of open-set PSG with large multimodal models.

Abstract

Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize their relations, enabling the structured understanding of an image. Previous methods focus on predicting predefined object and relation categories, hence limiting their applications in the open world scenarios. With the rapid development of large multimodal models (LMMs), significant progress has been made in open-set object detection and segmentation, yet open-set relation prediction in PSG remains unexplored. In this paper, we focus on the task of open-set relation prediction integrated with a pretrained open-set panoptic segmentation model to achieve true open-set panoptic scene graph generation (OpenPSG). Our OpenPSG leverages LMMs to achieve open-set relation prediction in an autoregressive manner. We introduce a relation query transformer to efficiently extract visual features of object pairs and estimate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

franciszzj/openpsg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques

MethodsFocus