Boosting Temporal Sentence Grounding via Causal Inference
Kefan Tang, Lihuo He, Jisheng Dang, Xinbo Gao

TL;DR
This paper introduces a causal inference framework for Temporal Sentence Grounding that reduces spurious correlations and improves model robustness by employing causal intervention and counterfactual reasoning.
Contribution
It proposes a novel causal inference-based approach for TSG, addressing biases and overfitting issues through textual intervention and visual counterfactuals.
Findings
Outperforms existing methods on public datasets.
Effectively reduces bias and improves generalization.
Demonstrates robustness against out-of-distribution data.
Abstract
Temporal Sentence Grounding (TSG) aims to identify relevant moments in an untrimmed video that semantically correspond to a given textual query. Despite existing studies having made substantial progress, they often overlook the issue of spurious correlations between video and textual queries. These spurious correlations arise from two primary factors: (1) inherent biases in the textual data, such as frequent co-occurrences of specific verbs or phrases, and (2) the model's tendency to overfit to salient or repetitive patterns in video content. Such biases mislead the model into associating textual cues with incorrect visual moments, resulting in unreliable predictions and poor generalization to out-of-distribution examples. To overcome these limitations, we propose a novel TSG framework, causal intervention and counterfactual reasoning that utilizes causal inference to eliminate spurious…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
