ReaSon: Reinforced Causal Search with Information Bottleneck for Video Understanding
Yuan Zhou, Litao Hua, Shilong Jin, Wentao Huang, Haoran Duan

TL;DR
ReaSon introduces a novel framework for keyframe selection in video understanding, combining causal inference and reinforcement learning to improve performance with limited frames across multiple datasets.
Contribution
It proposes a new causal information bottleneck approach and a reinforcement learning-based policy for effective keyframe selection in video analysis.
Findings
ReaSon outperforms state-of-the-art methods on NExT-QA, EgoSchema, and Video-MME datasets.
The method effectively captures causal and predictive information with limited frames.
ReaSon demonstrates strong generalization across different video understanding tasks.
Abstract
Keyframe selection has become essential for video understanding with vision-language models (VLMs) due to limited input tokens and the temporal sparsity of relevant information across video frames. Video understanding often relies on effective keyframes that are not only informative but also causally decisive. To this end, we propose Reinforced Causal Search with Information Bottleneck (ReaSon), a framework that formulates keyframe selection as an optimization problem with the help of a novel Causal Information Bottleneck (CIB), which explicitly defines keyframes as those satisfying both predictive sufficiency and causal necessity. Specifically, ReaSon employs a learnable policy network to select keyframes from a visually relevant pool of candidate frames to capture predictive sufficiency, and then assesses causal necessity via counterfactual interventions. Finally, a composite reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
