GSRFormer: Grounded Situation Recognition Transformer with Alternate   Semantic Attention Refinement

Zhi-Qi Cheng; Qi Dai; Siyao Li; Teruko Mitamura; Alexander G.; Hauptmann

arXiv:2208.08965·cs.CV·November 29, 2022

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement

Zhi-Qi Cheng, Qi Dai, Siyao Li, Teruko Mitamura, Alexander G., Hauptmann

PDF

1 Repo

TL;DR

GSRFormer introduces a novel two-stage transformer-based framework for Grounded Situation Recognition that models bidirectional relations between verbs and semantic roles, improving understanding and accuracy over existing methods.

Contribution

It proposes a new framework that postpones verb detection, learns intermediate role representations, and exploits semantic relations bidirectionally, outperforming prior approaches.

Findings

01

Outperforms state-of-the-art methods on SWiG benchmarks

02

Effectively models semantic relations between verbs and roles

03

Utilizes support images for improved learning

Abstract

Grounded Situation Recognition (GSR) aims to generate structured semantic summaries of images for "human-like" event understanding. Specifically, GSR task not only detects the salient activity verb (e.g. buying), but also predicts all corresponding semantic roles (e.g. agent and goods). Inspired by object detection and image captioning tasks, existing methods typically employ a two-stage framework: 1) detect the activity verb, and then 2) predict semantic roles based on the detected verb. Obviously, this illogical framework constitutes a huge obstacle to semantic understanding. First, pre-detecting verbs solely without semantic roles inevitably fails to distinguish many similar daily activities (e.g., offering and giving, buying and selling). Second, predicting semantic roles in a closed auto-regressive manner can hardly exploit the semantic relations among the verb and roles. To this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhiqic/gsrformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.