Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs
Osman \"Ulger, Julian Wiederer, Mohsen Ghafoorian, Vasileios, Belagiannis, Pascal Mettes

TL;DR
This paper introduces MTD-GNN, a novel graph neural network designed to predict evolving multi-type relations in dynamic video scene graphs, improving over static and existing spatio-temporal models.
Contribution
The paper presents a new factorized spatio-temporal attention layer and a multi-task loss for multi-relation edge prediction in dynamic video graphs.
Findings
Outperforms existing static and spatio-temporal GNNs
Effectively models multiple relation types simultaneously
Improves predicate classification accuracy on ActionGenome and CLEVRER
Abstract
Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Human Pose and Action Recognition · Multimodal Machine Learning Applications
