BRITE: A Benchmark for Reliable and Interpretable T2V Evaluation on Implausible Scenarios

Advait Tilak; Jiwon Choi; Nazifa Mouli; Wei Le

arXiv:2605.00873·cs.MM·May 5, 2026

BRITE: A Benchmark for Reliable and Interpretable T2V Evaluation on Implausible Scenarios

Advait Tilak, Jiwon Choi, Nazifa Mouli, Wei Le

PDF

TL;DR

BRITE is a comprehensive benchmark for evaluating the reliability and interpretability of Text-to-Video models, especially in implausible scenarios, using human-in-the-loop assessments and QA-based metrics.

Contribution

It introduces the first unified framework that assesses implausible prompts, audio-visual alignment, and interpretability, addressing gaps in existing T2V evaluation methods.

Findings

01

Models perform well on static object composition.

02

Significant degradation in object-action binding and synchronization.

03

BRITE reveals critical performance gaps in state-of-the-art models.

Abstract

The rapid advancement of photorealistic Text-to-Video (T2V) generation brings in an urgent need for up-to-date evaluation methods. Existing benchmarks largely overlooked implausible scenarios and do not measure audio-visual alignment. We introduce BRITE, the first framework that unifies (1) implausible prompting, (2) fine-grained assessment of audio-visual consistency, and (3) QA-based interpretable evaluation into a comprehensive T2V benchmark. Unlike fully automated Multimodal LLM-based pipelines, which are prone to hallucination and prompt ambiguity, BRITE guarantees reliability through a rigorous human-in-the-loop protocol for benchmark creation. Evaluating five state-of-the-art models (Sora 2, Veo 3.1, Runway Gen4.5, Pixverse V5.5, and Qwen3Max), we reveal a critical performance gap: while models excel at static object composition, they exhibit significant degradation in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.