Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios
Peizheng Yan, Yu Zhao, Liang Xie, Juntong Qi, Mingming Wang, Erwei Yin

TL;DR
Event-Causal RAG introduces a novel retrieval-augmented framework that segments streaming videos into structured events, enabling efficient long-video reasoning with causal inference and improved memory management.
Contribution
The paper proposes a lightweight, event-based retrieval-augmented approach with a structured event graph and bidirectional retrieval for long-video reasoning, surpassing existing clip-based methods.
Findings
Outperforms clip-based retrieval baselines on long-video benchmarks.
Effectively models causal dependencies across extended temporal gaps.
Improves memory efficiency and streaming robustness.
Abstract
Recent large vision-language models have achieved strong performance on short- and medium-length video understanding, yet they remain inadequate for ultra-long or even infinite video reasoning, where models must preserve coherent memory over extended durations and infer causal dependencies across temporally distant events. Existing end-to-end video understanding methods are fundamentally limited by the complexity of self-attention, while recent retrieval-augmented generation (RAG) approaches still suffer from fragmented clip-level memory, weak modeling of temporal and causal structure, and high storage and online inference costs. We present Event-Causal RAG, a lightweight retrieval-augmented framework for infinite long-video reasoning. Instead of indexing fixed-length clips, our method segments streaming videos into semantically coherent events and represents each event as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
