In Defense of Structural Symbolic Representation for Video   Event-Relation Prediction

Andrew Lu; Xudong Lin; Yulei Niu; Shih-Fu Chang

arXiv:2301.03410·cs.CV·April 13, 2023·1 cites

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction

Andrew Lu, Xudong Lin, Yulei Niu, Shih-Fu Chang

PDF

Open Access

TL;DR

This paper defends the use of structural symbolic representations for video event-relation prediction, identifies reasons for past failures, and introduces an improved model with factual knowledge that achieves state-of-the-art results.

Contribution

It provides an empirical analysis of SSR-based methods, addresses evaluation challenges, and enhances the model with external knowledge for better performance.

Findings

01

Identified suboptimal training as a cause of previous SSR failures.

02

Showed evaluation based solely on videos is currently unfeasible.

03

Achieved a 25% macro-accuracy boost with the new model.

Abstract

Understanding event relationships in videos requires a model to understand the underlying structures of events (i.e. the event type, the associated argument roles, and corresponding entities) and factual knowledge for reasoning. Structural symbolic representation (SSR) based methods directly take event types and associated argument roles/entities as inputs to perform reasoning. However, the state-of-the-art video event-relation prediction system shows the necessity of using continuous feature vectors from input videos; existing methods based solely on SSR inputs fail completely, even when given oracle event types and argument roles. In this paper, we conduct an extensive empirical analysis to answer the following questions: 1) why SSR-based method failed; 2) how to understand the evaluation setting of video event relation prediction properly; 3) how to uncover the potential of SSR-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition

Methodsfail