Graph Neural Network for Video Relocalization
Yuan Zhou, Mingfei Wang, Ruolin Wang, Shuwei Huo

TL;DR
This paper introduces a graph neural network approach for video relocalization, addressing the inconsistency between frame-level and video-level feature similarities, leading to improved retrieval accuracy.
Contribution
It proposes a novel Multi-Graph Feature Fusion Module that models video features as a graph to better capture relations for relocalization.
Findings
Outperforms state-of-the-art methods on ActivityNet v1.2.
Outperforms state-of-the-art methods on Thumos14.
Effectively models feature relations with graph neural networks.
Abstract
In this paper, we focus on video relocalization task, which uses a query video clip as input to retrieve a semantic relative video clip in another untrimmed long video. we find that in video relocalization datasets, there exists a phenomenon showing that there does not exist consistent relationship between feature similarity by frame and feature similarity by video, which affects the feature fusion among frames. However, existing video relocalization methods do not fully consider it. Taking this phenomenon into account, in this article, we treat video features as a graph by concatenating the query video feature and proposal video feature along time dimension, where each timestep is treated as a node, each row of the feature matrix is treated as feature of each node. Then, with the power of graph neural networks, we propose a Multi-Graph Feature Fusion Module to fuse the relation feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Video Analysis and Summarization · Multimodal Machine Learning Applications
