Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

Osman \"Ulger; Julian Wiederer; Mohsen Ghafoorian; Vasileios; Belagiannis; Pascal Mettes

arXiv:2212.02875·cs.CV·December 7, 2022

Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

Osman \"Ulger, Julian Wiederer, Mohsen Ghafoorian, Vasileios, Belagiannis, Pascal Mettes

PDF

Open Access

TL;DR

This paper introduces MTD-GNN, a novel graph neural network designed to predict evolving multi-type relations in dynamic video scene graphs, improving over static and existing spatio-temporal models.

Contribution

The paper presents a new factorized spatio-temporal attention layer and a multi-task loss for multi-relation edge prediction in dynamic video graphs.

Findings

01

Outperforms existing static and spatio-temporal GNNs

02

Effectively models multiple relation types simultaneously

03

Improves predicate classification accuracy on ActionGenome and CLEVRER

Abstract

Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Human Pose and Action Recognition · Multimodal Machine Learning Applications