LeAdQA: LLM-Driven Context-Aware Temporal Grounding for Video Question Answering
Xinxin Dong, Baoyun Peng, Haokai Ma, Yufei Wang, Zixuan Dong, Fei Hu, Xiaodong Wang

TL;DR
LeAdQA introduces a novel framework combining causal-aware query refinement and fine-grained visual grounding, significantly improving complex reasoning in VideoQA tasks by leveraging LLMs and adaptive fusion for precise segment retrieval.
Contribution
This paper presents LeAdQA, a new approach that enhances VideoQA by integrating causal-aware query reformulation with targeted visual grounding, addressing limitations of previous methods.
Findings
Achieves state-of-the-art performance on NExT-QA, IntentQA, and NExT-GQA datasets.
Demonstrates improved understanding of causal-temporal structures in videos.
Enhances reasoning accuracy while maintaining computational efficiency.
Abstract
Video Question Answering (VideoQA) requires identifying sparse critical moments in long videos and reasoning about their causal relationships to answer semantically complex questions. While recent advances in multimodal learning have improved alignment and fusion, current approaches remain limited by two prevalent but fundamentally flawed strategies: (1) task-agnostic sampling indiscriminately processes all frames, overwhelming key events with irrelevant content; and (2) heuristic retrieval captures superficial patterns but misses causal-temporal structures needed for complex reasoning. To address these challenges, we introduce LeAdQA, an innovative approach that bridges these gaps through synergizing causal-aware query refinement with fine-grained visual grounding. Our method first leverages LLMs to reformulate question-option pairs, resolving causal ambiguities and sharpening temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
