VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR

Shenghui Chen; Po-han Li; Sandeep Chinchali; Ufuk Topcu

arXiv:2505.17423·cs.CV·September 24, 2025

VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR

Shenghui Chen, Po-han Li, Sandeep Chinchali, Ufuk Topcu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

VIBE is an annotation-free evaluation method for video summaries that improves decision-making by selecting summaries based on grounding and utility scores, enhancing task accuracy and efficiency.

Contribution

VIBE introduces a novel annotation-free framework for evaluating and selecting video summaries using grounding and utility metrics, improving downstream task performance.

Findings

01

Summaries selected by VIBE boost task accuracy by up to 61.23%.

02

VIBE reduces response time by up to 75.77%.

03

Effective human decision-making is supported without costly annotations.

Abstract

Many decision-making tasks, where both accuracy and efficiency matter, still require human supervision. For example, tasks like traffic officers reviewing hour-long dashcam footage or researchers screening conference videos can benefit from concise summaries that reduce cognitive load and save time. Yet current vision-language models (VLMs) often produce verbose, redundant outputs that hinder task performance. Existing video caption evaluation depends on costly human annotations and overlooks the summaries' utility in downstream tasks. We address these gaps with Video-to-text Information Bottleneck Evaluation (VIBE), an annotation-free method that scores VLM outputs using two metrics: grounding (how well the summary aligns with visual content) and utility (how informative it is for the task). VIBE selects from randomly sampled VLM outputs by ranking them according to the two scores to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

utaustin-swarmlab/task-aware-tldr-public
tfOfficial

Datasets

vivianchen98/LearningPaper24
dataset· 133 dl
133 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Multimedia Communication and Technology · Video Analysis and Summarization