SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation
Woo Suk Choi, Yu-Jung Heo, Byoung-Tak Zhang

TL;DR
This paper introduces SGRAM, a novel two-stage framework that uses abstract meaning representation to improve scene graph parsing from textual descriptions, outperforming previous dependency parsing methods and enhancing image retrieval tasks.
Contribution
SGRAM is the first to utilize AMR for scene graph parsing from text, significantly improving accuracy over dependency parsing-based models and leveraging Transformer models for better scene graph generation.
Findings
SGRAM outperforms dependency parsing-based models by 11.61%.
SGRAM surpasses previous state-of-the-art Transformer-based models by 3.78%.
Scene graphs generated by SGRAM improve image retrieval performance.
Abstract
Scene graph is structured semantic representation that can be modeled as a form of graph from images and texts. Image-based scene graph generation research has been actively conducted until recently, whereas text-based scene graph generation research has not. In this paper, we focus on the problem of scene graph parsing from textual description of a visual scene. The core idea is to use abstract meaning representation (AMR) instead of the dependency parsing mainly used in previous studies. AMR is a graph-based semantic formalism of natural language which abstracts concepts of words in a sentence contrary to the dependency parsing which considers dependency relationships on all words in a sentence. To this end, we design a simple yet effective two-stage scene graph parsing framework utilizing abstract meaning representation, SGRAM (Scene GRaph parsing via Abstract Meaning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Residual Connection · Dropout
