Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
S P Sharan, Minkyu Choi, Sahil Shah, Harsh Goel, Mohammad Omama,, Sandeep Chinchali

TL;DR
This paper introduces NeuS-V, a formal verification-based metric for evaluating text-to-video models, focusing on temporal fidelity and alignment, and demonstrates its superior correlation with human judgment over existing metrics.
Contribution
The paper presents NeuS-V, a novel neuro-symbolic evaluation method that rigorously assesses text-to-video alignment using formal verification techniques, addressing limitations of current metrics.
Findings
NeuS-V correlates over 5x better with human evaluations than existing metrics.
Current models perform poorly on temporally complex prompts.
A new dataset of temporally extended prompts is introduced for benchmarking.
Abstract
Recent advancements in text-to-video models such as Sora, Gen-3, MovieGen, and CogVideoX are pushing the boundaries of synthetic video generation, with adoption seen in fields like robotics, autonomous driving, and entertainment. As these models become prevalent, various metrics and benchmarks have emerged to evaluate the quality of the generated videos. However, these metrics emphasize visual quality and smoothness, neglecting temporal fidelity and text-to-video alignment, which are crucial for safety-critical applications. To address this gap, we introduce NeuS-V, a novel synthetic video evaluation metric that rigorously assesses text-to-video alignment using neuro-symbolic formal verification techniques. Our approach first converts the prompt into a formally defined Temporal Logic (TL) specification and translates the generated video into an automaton representation. Then, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Video Analysis and Summarization · Topic Modeling
