AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
Riku Togashi, Mayu Otani, Yuta Nakashima, Esa Rahtu, Janne Heikkila,, Tetsuya Sakai

TL;DR
This paper introduces AxIoU, a new evaluation measure for Video Moment Retrieval that addresses limitations of existing metrics by considering rank positions and localization quality, and is grounded in formal axioms.
Contribution
We propose AxIoU, an axiomatic evaluation measure for VMR that overcomes rank insensitivity and binarization issues of current metrics, with theoretical justification and empirical validation.
Findings
AxIoU satisfies key axioms for VMR evaluation.
AxIoU correlates well with existing metrics and is stable across data variations.
It provides a more nuanced assessment of localization quality.
Abstract
Evaluation measures have a crucial impact on the direction of research. Therefore, it is of utmost importance to develop appropriate and reliable evaluation measures for new applications where conventional measures are not well suited. Video Moment Retrieval (VMR) is one such application, and the current practice is to use R@ for evaluating VMR systems. However, this measure has two disadvantages. First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top- ranked list by treating the list as a set. Second, it binarizes the Intersection over Union (IoU) of each retrieved video moment using the threshold and thereby ignoring fine-grained localisation quality of ranked moments. We propose an alternative measure for evaluating VMR, called Average Max IoU (AxIoU), which is free from the above two problems. We show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
