Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation
Tao Liu, Rongjie Li, Chongyu Wang, Xuming He

TL;DR
This paper introduces RAHP, a novel framework that enhances open-vocabulary scene graph generation by integrating relation-aware prompts and dynamic prompt selection, significantly improving alignment and diversity in visual relationship detection.
Contribution
The paper proposes a relation-aware hierarchical prompting framework that leverages entity clustering, large language models, and dynamic prompt selection to improve open-vocabulary scene graph generation.
Findings
Achieves state-of-the-art performance on Visual Genome and Open Images v6 datasets.
Effectively captures fine-grained visual interactions and diverse relationships.
Reduces noise and improves accuracy through adaptive prompt selection.
Abstract
Open-vocabulary Scene Graph Generation (OV-SGG) overcomes the limitations of the closed-set assumption by aligning visual relationship representations with open-vocabulary textual representations. This enables the identification of novel visual relationships, making it applicable to real-world scenarios with diverse relationships. However, existing OV-SGG methods are constrained by fixed text representations, limiting diversity and accuracy in image-text alignment. To address these challenges, we propose the Relation-Aware Hierarchical Prompting (RAHP) framework, which enhances text representation by integrating subject-object and region-specific relation information. Our approach utilizes entity clustering to address the complexity of relation triplet categories, enabling the effective integration of subject-object information. Additionally, we utilize a large language model (LLM) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsReview-guided Answer Helpfulness Prediction
