GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives
Zuyao Chen, Jinlin Wu, Zhen Lei, Zhaoxiang Zhang, and Changwen Chen

TL;DR
GPT4SGG introduces a novel framework that leverages both holistic and region-specific narratives with large language models to improve scene graph generation from caption data, addressing ambiguity and bias issues.
Contribution
The paper proposes GPT4SGG, a divide-and-conquer approach using large language models to synthesize accurate scene graphs from complex caption data by decomposing scenes into regions.
Findings
Significant performance improvement in scene graph accuracy.
Effective handling of ambiguity and long-tail bias.
Enhanced comprehensiveness of generated scene graphs.
Abstract
Training Scene Graph Generation (SGG) models with natural language captions has become increasingly popular due to the abundant, cost-effective, and open-world generalization supervision signals that natural language offers. However, such unstructured caption data and its processing pose significant challenges in learning accurate and comprehensive scene graphs. The challenges can be summarized as three aspects: 1) traditional scene graph parsers based on linguistic representation often fail to extract meaningful relationship triplets from caption data. 2) grounding unlocalized objects of parsed triplets will meet ambiguity issues in visual-language alignment. 3) caption data typically are sparse and exhibit bias to partial observations of image content. Aiming to address these problems, we propose a divide-and-conquer strategy with a novel framework named \textit{GPT4SGG}, to obtain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training · Attention Is All You Need · Linear Layer · Residual Connection · Dropout · Softmax · Multi-Head Attention · Byte Pair Encoding · Adam · Absolute Position Encodings
