On Semantic Similarity in Video Retrieval

Michael Wray; Hazel Doughty; Dima Damen

arXiv:2103.10095·cs.CV·March 19, 2021

On Semantic Similarity in Video Retrieval

Michael Wray, Hazel Doughty, Dima Damen

PDF

3 Repos

TL;DR

This paper challenges the traditional instance-based evaluation in video retrieval, proposing a semantic similarity approach that considers multiple relevant items and ranks by similarity, with proxies for large datasets.

Contribution

It introduces a semantic similarity framework for video retrieval and proposes proxies to estimate similarities without extra annotations.

Findings

01

Semantic similarity approach provides a more realistic evaluation.

02

Proxies enable large-scale similarity estimation without additional labels.

03

Analysis on three datasets demonstrates the effectiveness of the proposed method.

Abstract

Current video retrieval efforts all found their evaluation on an instance-based assumption, that only a single caption is relevant to a query video and vice versa. We demonstrate that this assumption results in performance comparisons often not indicative of models' retrieval capabilities. We propose a move to semantic similarity video retrieval, where (i) multiple videos/captions can be deemed equally relevant, and their relative ranking does not affect a method's reported performance and (ii) retrieved videos/captions are ranked by their similarity to a query. We propose several proxies to estimate semantic similarities in large-scale retrieval datasets, without additional annotations. Our analysis is performed on three commonly used video retrieval datasets (MSR-VTT, YouCook2 and EPIC-KITCHENS).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.