Visual Semantic Role Labeling for Video Understanding

Arka Sadhu; Tanmay Gupta; Mark Yatskar; Ram Nevatia; Aniruddha; Kembhavi

arXiv:2104.00990·cs.CV·April 5, 2021

Visual Semantic Role Labeling for Video Understanding

Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha, Kembhavi

PDF

1 Repo

TL;DR

This paper introduces a novel framework for video understanding using visual semantic role labeling, along with a large-scale annotated dataset called VidSitu, enabling detailed analysis of events and entities in movies.

Contribution

The paper presents the VidSitu benchmark dataset and a new framework for semantic role labeling in videos, advancing the understanding of complex, diverse movie clips.

Findings

01

VidSitu contains 29K annotated 10-second clips from movies.

02

Standard models show room for improvement on semantic role labeling in videos.

03

Comprehensive analysis highlights challenges and opportunities in video event understanding.

Abstract

We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling. We represent videos as a set of related events, wherein each event consists of a verb and multiple entities that fulfill various roles relevant to that event. To study the challenging task of semantic role labeling in videos or VidSRL, we introduce the VidSitu benchmark, a large-scale video understanding data source with $29 K$ $10$ -second movie clips richly annotated with a verb and semantic-roles every $2$ seconds. Entities are co-referenced across events within a movie clip and events are connected to each other via event-event relations. Clips in VidSitu are drawn from a large collection of movies ( $\sim 3 K$ ) and have been chosen to be both complex ( $\sim 4.2$ unique verbs within a video) as well as diverse ( $\sim 200$ verbs have more than $100$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TheShadow29/VidSitu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.