Scene Graph Generation with Role-Playing Large Language Models

Guikun Chen; Jin Li; Wenguan Wang

arXiv:2410.15364·cs.CV·October 22, 2024

Scene Graph Generation with Role-Playing Large Language Models

Guikun Chen, Jin Li, Wenguan Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces SDSGG, a scene-specific scene graph generation framework that uses role-playing large language models to adapt text classifiers based on scene content, significantly improving relation recognition accuracy.

Contribution

The work proposes a novel scene-specific OVSGG framework with adaptive text classifiers generated by role-playing LLMs and a mutual visual adapter for better relation modeling.

Findings

01

SDSGG outperforms existing methods on benchmark datasets.

02

Adaptive scene-specific classifiers improve relation detection.

03

Role-playing LLMs enhance descriptive feature analysis.

Abstract

Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-language models such as CLIP and follow a standard zero-shot pipeline -- computing similarity between the query image and the text embeddings for each category (i.e., text classifiers). In this work, we argue that the text classifiers adopted by existing OVSGG methods, i.e., category-/part-level prompts, are scene-agnostic as they remain unchanged across contexts. Using such fixed text classifiers not only struggles to model visual relations with high variance, but also falls short in adapting to distinct contexts. To plug these intrinsic shortcomings, we devise SDSGG, a scene-specific description based OVSGG framework where the weights of text classifiers are adaptively adjusted according to the visual content. In particular, to generate comprehensive and diverse descriptions oriented to the scene, an LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guikunchen/sdsgg
pytorchOfficial

Videos

Scene Graph Generation with Role-Playing Large Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training