VRAG: Region Attention Graphs for Content-Based Video Retrieval

Kennard Ng; Ser-Nam Lim; Gim Hee Lee

arXiv:2205.09068·cs.CV·May 19, 2022·6 cites

VRAG: Region Attention Graphs for Content-Based Video Retrieval

Kennard Ng, Ser-Nam Lim, Gim Hee Lee

PDF

Open Access

TL;DR

VRAG introduces a region attention graph network that enhances content-based video retrieval by capturing spatio-temporal relations at the region level, achieving state-of-the-art results efficiently.

Contribution

The paper presents VRAG, a novel region-level graph network that models semantic relations in videos, improving retrieval accuracy over existing video-level methods.

Findings

01

Achieves new state-of-the-art in video retrieval

02

Shot-level VRAG outperforms other video-level methods

03

Closer performance to frame-level methods with faster speed

Abstract

Content-based Video Retrieval (CBVR) is used on media-sharing platforms for applications such as video recommendation and filtering. To manage databases that scale to billions of videos, video-level approaches that use fixed-size embeddings are preferred due to their efficiency. In this paper, we introduce Video Region Attention Graph Networks (VRAG) that improves the state-of-the-art of video-level methods. We represent videos at a finer granularity via region-level features and encode video spatio-temporal dynamics through region-level relations. Our VRAG captures the relationships between regions based on their semantic content via self-attention and the permutation invariant aggregation of Graph Convolution. In addition, we show that the performance gap between video-level and frame-level methods can be reduced by segmenting videos into shots and using shot embeddings for video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsConvolution