VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation
Zijian Zhou, Miaojing Shi, Holger Caesar

TL;DR
VLPrompt leverages large language models and vision information to improve panoptic scene graph generation, especially for rare relations, by addressing the long-tail problem with a novel prompting approach.
Contribution
The paper introduces VLPrompt, a novel vision-language prompting method that incorporates language information from LLMs to enhance relation prediction in PSG tasks, especially for infrequent relations.
Findings
VLPrompt outperforms previous methods on PSG dataset.
Incorporating language info alleviates the long-tail relation problem.
Attention-based prompter network achieves precise relation prediction.
Abstract
Panoptic Scene Graph Generation (PSG) aims at achieving a comprehensive image understanding by simultaneously segmenting objects and predicting relations among objects. However, the long-tail problem among relations leads to unsatisfactory results in real-world applications. Prior methods predominantly rely on vision information or utilize limited language information, such as object or relation names, thereby overlooking the utility of language information. Leveraging the recent progress in Large Language Models (LLMs), we propose to use language information to assist relation prediction, particularly for rare relations. To this end, we propose the Vision-Language Prompting (VLPrompt) model, which acquires vision information from images and language information from LLMs. Then, through a prompter network based on attention mechanism, it achieves precise relation prediction. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
