Towards Debiasing Frame Length Bias in Text-Video Retrieval via Causal Intervention
Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

TL;DR
This paper identifies and addresses a temporal frame length bias in text-video retrieval, proposing a causal intervention method that improves semantic relevance metrics and mitigates bias across multiple datasets.
Contribution
It is the first systematic study of frame length bias in text-video retrieval and introduces a causal debiasing approach that outperforms baselines and SOTA methods.
Findings
The proposed method reduces frame length bias effects.
Model achieves higher nDCG scores on multiple datasets.
Bias mitigation improves semantic relevance in retrieval results.
Abstract
Many studies focus on improving pretraining or developing new backbones in text-video retrieval. However, existing methods may suffer from the learning and inference bias issue, as recent research suggests in other text-video-related tasks. For instance, spatial appearance features on action recognition or temporal object co-occurrences on video scene graph generation could induce spurious correlations. In this work, we present a unique and systematic study of a temporal bias due to frame length discrepancy between training and test sets of trimmed video clips, which is the first such attempt for a text-video retrieval task, to the best of our knowledge. We first hypothesise and verify the bias on how it would affect the model illustrated with a baseline study. Then, we propose a causal debiasing approach and perform extensive experiments and ablation studies on the Epic-Kitchens-100,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization
MethodsFocus
