Specification-Driven Video Search via Foundation Models and Formal Verification
Yunhao Yang, Jean-Rapha\"el Gaglione, Sandeep Chinchali, Ufuk Topcu

TL;DR
This paper presents a novel, formal-methods-based approach for automatically searching videos for specific events using foundation models and temporal logic, achieving high precision and efficiency.
Contribution
It introduces a new method that maps text descriptions to temporal logic and verifies video content against these specifications using automata, combining vision models with formal verification.
Findings
Achieves over 90% precision in privacy-sensitive video search.
Demonstrates effectiveness on autonomous driving datasets.
Provides qualitative and quantitative validation of the approach.
Abstract
The increasing abundance of video data enables users to search for events of interest, e.g., emergency incidents. Meanwhile, it raises new concerns, such as the need for preserving privacy. Existing approaches to video search require either manual inspection or a deep learning model with massive training. We develop a method that uses recent advances in vision and language models, as well as formal methods, to search for events of interest in video clips automatically and efficiently. The method consists of an algorithm to map text-based event descriptions into linear temporal logic over finite traces (LTL) and an algorithm to construct an automaton encoding the video information. Then, the method formally verifies the automaton representing the video against the LTL specifications and adds the pertinent video clips to the search result if the automaton satisfies the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
