VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR
Shenghui Chen, Po-han Li, Sandeep Chinchali, Ufuk Topcu

TL;DR
VIBE is an annotation-free evaluation method for video summaries that improves decision-making by selecting summaries based on grounding and utility scores, enhancing task accuracy and efficiency.
Contribution
VIBE introduces a novel annotation-free framework for evaluating and selecting video summaries using grounding and utility metrics, improving downstream task performance.
Findings
Summaries selected by VIBE boost task accuracy by up to 61.23%.
VIBE reduces response time by up to 75.77%.
Effective human decision-making is supported without costly annotations.
Abstract
Many decision-making tasks, where both accuracy and efficiency matter, still require human supervision. For example, tasks like traffic officers reviewing hour-long dashcam footage or researchers screening conference videos can benefit from concise summaries that reduce cognitive load and save time. Yet current vision-language models (VLMs) often produce verbose, redundant outputs that hinder task performance. Existing video caption evaluation depends on costly human annotations and overlooks the summaries' utility in downstream tasks. We address these gaps with Video-to-text Information Bottleneck Evaluation (VIBE), an annotation-free method that scores VLM outputs using two metrics: grounding (how well the summary aligns with visual content) and utility (how informative it is for the task). VIBE selects from randomly sampled VLM outputs by ranking them according to the two scores to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Multimedia Communication and Technology · Video Analysis and Summarization
