Visual Semantic Parsing: From Images to Abstract Meaning Representation
Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi, Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

TL;DR
This paper introduces a novel approach to visual scene understanding by converting images into Abstract Meaning Representation graphs, leveraging NLP techniques to capture high-level semantics and unify multiple descriptions.
Contribution
It adapts a text-based AMR parser for images, creating linguistically informed semantic graphs that go beyond traditional scene graphs, enabling richer scene understanding.
Findings
Successfully repurposed a text-to-AMR parser for images
Generated unified meta-AMR graphs from multiple descriptions
Demonstrated potential for improved scene understanding
Abstract
The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs or frames. These formalisms remain limited in the nature of entities and relations they can capture. In this paper, we propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR), to address these shortcomings. Compared to scene graphs, which largely emphasize spatial relationships, our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques
