Towards Neuro-Symbolic Video Understanding

Minkyu Choi; Harsh Goel; Mohammad Omama; Yunhao Yang; Sahil Shah,; Sandeep Chinchali

arXiv:2403.11021·cs.CV·December 4, 2024·1 cites

Towards Neuro-Symbolic Video Understanding

Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah,, Sandeep Chinchali

PDF

Open Access 2 Repos

TL;DR

This paper introduces a neuro-symbolic approach for video understanding that combines vision-language models for frame semantics with temporal logic for long-term reasoning, significantly improving event detection accuracy.

Contribution

It presents a novel system that decouples semantic understanding from temporal reasoning, using state machines and temporal logic to enhance long-term video analysis.

Findings

01

Improved F1 score for complex event identification by 9-15%.

02

Effective long-term reasoning across video frames.

03

Outperforms benchmarks on self-driving datasets.

Abstract

The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reasoning across frames. A key reason for this failure is that they intertwine per-frame perception and temporal reasoning into a single deep network. Hence, decoupling but co-designing semantic understanding and temporal reasoning is essential for efficient scene identification. We propose a system that leverages vision-language models for semantic understanding of individual frames but effectively reasons about the long-term evolution of events using state machines and temporal logic (TL) formulae…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychiatry, Mental Health, Neuroscience