Counterfactual Cross-modality Reasoning for Weakly Supervised Video   Moment Localization

Zezhong Lv; Bing Su; Ji-Rong Wen

arXiv:2308.05648·cs.CV·October 17, 2023

Counterfactual Cross-modality Reasoning for Weakly Supervised Video Moment Localization

Zezhong Lv, Bing Su, Ji-Rong Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a counterfactual cross-modality reasoning approach to improve weakly supervised video moment localization by reducing spurious correlations and enhancing vision-language alignment.

Contribution

It proposes a novel counterfactual reasoning method that explicitly models and suppresses spurious effects in cross-modality reconstruction for better localization.

Findings

01

Significant improvement over existing weakly supervised methods

02

Effective mitigation of spurious correlations in cross-modality learning

03

Enhanced accuracy in video moment localization

Abstract

Video moment localization aims to retrieve the target segment of an untrimmed video according to the natural language query. Weakly supervised methods gains attention recently, as the precise temporal location of the target segment is not always available. However, one of the greatest challenges encountered by the weakly supervised method is implied in the mismatch between the video and language induced by the coarse temporal annotations. To refine the vision-language alignment, recent works contrast the cross-modality similarities driven by reconstructing masked queries between positive and negative video proposals. However, the reconstruction may be influenced by the latent spurious correlation between the unmasked and the masked parts, which distorts the restoring process and further degrades the efficacy of contrastive learning since the masked words are not completely reconstructed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sldz0306/ccr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning