What Matters for Ad-hoc Video Search? A Large-scale Evaluation on   TRECVID

Aozhu Chen; Fan Hu; Zihan Wang; Fangming Zhou; Xirong Li

arXiv:2109.01774·cs.MM·September 8, 2021·1 cites

What Matters for Ad-hoc Video Search? A Large-scale Evaluation on TRECVID

Aozhu Chen, Fan Hu, Zihan Wang, Fangming Zhou, Xirong Li

PDF

Open Access

TL;DR

This paper presents a large-scale, systematic evaluation of various components in ad-hoc video search solutions using TRECVID data, providing insights into what factors most influence performance.

Contribution

It introduces a comprehensive evaluation framework that compares different models, features, and training data combinations to identify key factors affecting AVS performance.

Findings

01

Certain visual features significantly improve search accuracy.

02

Cross-modal matching models vary in effectiveness.

03

Training data quality impacts overall system performance.

Abstract

For quantifying progress in Ad-hoc Video Search (AVS), the annual TRECVID AVS task is an important international evaluation. Solutions submitted by the task participants vary in terms of their choices of cross-modal matching models, visual features and training data. As such, what one may conclude from the evaluation is at a high level that is insufficient to reveal the influence of the individual components. In order to bridge the gap between the current solution-level comparison and the desired component-wise comparison, we propose in this paper a large-scale and systematic evaluation on TRECVID. By selected combinations of state-of-the-art matching models, visual features and (pre-)training data, we construct a set of 25 different solutions and evaluate them on the TRECVID AVS tasks 2016--2020. The presented evaluation helps answer the key question of what matters for AVS. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition