Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios

Peizheng Yan; Yu Zhao; Liang Xie; Juntong Qi; Mingming Wang; Erwei Yin

arXiv:2605.06185·cs.AI·May 8, 2026

Event-Causal RAG: A Retrieval-Augmented Generation Framework for Long Video Reasoning in Complex Scenarios

Peizheng Yan, Yu Zhao, Liang Xie, Juntong Qi, Mingming Wang, Erwei Yin

PDF

TL;DR

Event-Causal RAG introduces a novel retrieval-augmented framework that segments streaming videos into structured events, enabling efficient long-video reasoning with causal inference and improved memory management.

Contribution

The paper proposes a lightweight, event-based retrieval-augmented approach with a structured event graph and bidirectional retrieval for long-video reasoning, surpassing existing clip-based methods.

Findings

01

Outperforms clip-based retrieval baselines on long-video benchmarks.

02

Effectively models causal dependencies across extended temporal gaps.

03

Improves memory efficiency and streaming robustness.

Abstract

Recent large vision-language models have achieved strong performance on short- and medium-length video understanding, yet they remain inadequate for ultra-long or even infinite video reasoning, where models must preserve coherent memory over extended durations and infer causal dependencies across temporally distant events. Existing end-to-end video understanding methods are fundamentally limited by the $O (n^{2})$ complexity of self-attention, while recent retrieval-augmented generation (RAG) approaches still suffer from fragmented clip-level memory, weak modeling of temporal and causal structure, and high storage and online inference costs. We present Event-Causal RAG, a lightweight retrieval-augmented framework for infinite long-video reasoning. Instead of indexing fixed-length clips, our method segments streaming videos into semantically coherent events and represents each event as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.