Visual Semantic Parsing: From Images to Abstract Meaning Representation

Mohamed Ashraf Abdelsalam; Zhan Shi; Federico Fancellu; Kalliopi; Basioti; Dhaivat J. Bhatt; Vladimir Pavlovic; Afsaneh Fazly

arXiv:2210.14862·cs.CV·October 28, 2022

Visual Semantic Parsing: From Images to Abstract Meaning Representation

Mohamed Ashraf Abdelsalam, Zhan Shi, Federico Fancellu, Kalliopi, Basioti, Dhaivat J. Bhatt, Vladimir Pavlovic, Afsaneh Fazly

PDF

Open Access

TL;DR

This paper introduces a novel approach to visual scene understanding by converting images into Abstract Meaning Representation graphs, leveraging NLP techniques to capture high-level semantics and unify multiple descriptions.

Contribution

It adapts a text-based AMR parser for images, creating linguistically informed semantic graphs that go beyond traditional scene graphs, enabling richer scene understanding.

Findings

01

Successfully repurposed a text-to-AMR parser for images

02

Generated unified meta-AMR graphs from multiple descriptions

03

Demonstrated potential for improved scene understanding

Abstract

The success of scene graphs for visual scene understanding has brought attention to the benefits of abstracting a visual input (e.g., image) into a structured representation, where entities (people and objects) are nodes connected by edges specifying their relations. Building these representations, however, requires expensive manual annotation in the form of images paired with their scene graphs or frames. These formalisms remain limited in the nature of entities and relations they can capture. In this paper, we propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR), to address these shortcomings. Compared to scene graphs, which largely emphasize spatial relationships, our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques