ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

Giorgos Kordopatis-Zilos; Symeon Papadopoulos; Ioannis Patras; Ioannis; Kompatsiaris

arXiv:1908.07410·cs.CV·August 21, 2019

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, Ioannis, Kompatsiaris

PDF

1 Repo

TL;DR

ViSiL introduces a novel CNN-based architecture for fine-grained spatio-temporal video similarity learning, capturing detailed intra- and inter-frame relations to improve video retrieval accuracy.

Contribution

The paper presents a new method that computes video similarity from frame-level relations without aggregating features prematurely, enhancing retrieval performance.

Findings

01

Achieves large improvements over state-of-the-art on five benchmark datasets.

02

Effectively captures temporal similarity patterns between matching frame sequences.

03

Demonstrates robustness across four different video retrieval tasks.

Abstract

In this paper we introduce ViSiL, a Video Similarity Learning architecture that considers fine-grained Spatio-Temporal relations between pairs of videos -- such relations are typically lost in previous video retrieval approaches that embed the whole frame or even the whole video into a vector descriptor before the similarity estimation. By contrast, our Convolutional Neural Network (CNN)-based approach is trained to calculate video-to-video similarity from refined frame-to-frame similarity matrices, so as to consider both intra- and inter-frame relations. In the proposed method, pairwise frame similarity is estimated by applying Tensor Dot (TD) followed by Chamfer Similarity (CS) on regional CNN frame features - this avoids feature aggregation before the similarity calculation between frames. Subsequently, the similarity matrix between all video frames is fed to a four-layer CNN, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MKLab-ITI/visil
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTriplet Loss