A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach
Xiaohan Lan, Yitian Yuan, Xin Wang, Long Chen, Zhi Wang, Lin Ma and, Wenwu Zhu

TL;DR
This paper critically examines biases in datasets and evaluation metrics for Temporal Sentence Grounding in Videos, proposing re-organized datasets, a new metric, and a causality-based debiasing framework to improve benchmarking and unbiased moment prediction.
Contribution
It introduces a re-organized dataset with out-of-distribution splits, a new evaluation metric to reduce bias effects, and a causality-based multi-branch debiasing framework for more accurate grounding.
Findings
Re-organized datasets with OOD test splits reduce bias influence.
New metric 'dR@n,IoU@m' better evaluates true model performance.
Proposed MDD framework improves unbiased moment prediction results.
Abstract
Temporal Sentence Grounding in Videos (TSGV), which aims to ground a natural language sentence in an untrimmed video, has drawn widespread attention over the past few years. However, recent studies have found that current benchmark datasets may have obvious moment annotation biases, enabling several simple baselines even without training to achieve SOTA performance. In this paper, we take a closer look at existing evaluation protocols, and find both the prevailing dataset and evaluation metrics are the devils that lead to untrustworthy benchmarking. Therefore, we propose to re-organize the two widely-used datasets, making the ground-truth moment distributions different in the training and test splits, i.e., out-of-distribution (OOD) test. Meanwhile, we introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Topic Modeling
MethodsBalanced Selection
