VC-Bench: Pioneering the Video Connecting Benchmark with a Dataset and Evaluation Metrics

Zhiyu Yin; Zhipeng Liu; Kehai Chen; Lemao Liu; Jin Liu; Hong-Dong Li; Yang Xiang; Min Zhang

arXiv:2601.19236·cs.CV·January 28, 2026

VC-Bench: Pioneering the Video Connecting Benchmark with a Dataset and Evaluation Metrics

Zhiyu Yin, Zhipeng Liu, Kehai Chen, Lemao Liu, Jin Liu, Hong-Dong Li, Yang Xiang, Min Zhang

PDF

Open Access 3 Reviews

TL;DR

VC-Bench introduces a comprehensive benchmark dataset and evaluation metrics for the emerging task of video connecting, addressing the need for standardized assessment of smooth intermediate video generation between clips.

Contribution

This work presents VC-Bench, a new dataset and evaluation framework specifically designed for the video connecting task, filling a critical gap in standardized benchmarking.

Findings

01

Current models struggle with start-end consistency

02

Transition smoothness remains a significant challenge

03

Benchmark reveals limitations in existing video generation methods

Abstract

While current video generation focuses on text or image conditions, practical applications like video editing and vlogging often need to seamlessly connect separate clips. In our work, we introduce Video Connecting, an innovative task that aims to generate smooth intermediate video content between given start and end clips. However, the absence of standardized evaluation benchmarks has hindered the development of this task. To bridge this gap, we proposed VC-Bench, a novel benchmark specifically designed for video connecting. It includes 1,579 high-quality videos collected from public platforms, covering 15 main categories and 72 subcategories to ensure diversity and structure. VC-Bench focuses on three core aspects: Video Quality Score VQS, Start-End Consistency Score SECS, and Transition Smoothness Score TSS. Together, they form a comprehensive framework that moves beyond conventional…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. The proposed task is highly valuable and meaningful for future research. 2. This paper is well-organized and easy to read.

Weaknesses

1. The paper needs a more in-depth comparative analysis beyond the task definition to clarify what specific innovations have been made in constructing this benchmark, especially compared with the existing First-Last Frame to Video task. 2. Compared with the existing First-Last Frame to Video task, what different requirements does the video connecting task impose on the generation model? Are there corresponding experiment results to support this in the evaluation?

Reviewer 02Rating 6Confidence 4

Strengths

- Novel Task Definition: Clear formulation of the Video Connecting task as a distinct challenge, bridging isolated generation and temporal continuity - Comprehensive Benchmark Design: A well-curated dataset with rigorous filtering, aesthetic scoring, and scene detection ensures quality and diversity.

Weaknesses

- Model Diversity: Evaluation excludes closed-source systems (e.g., Sora, Runway Gen-3) that might exhibit different performance trends. - Metric Interpretability: Some metrics (e.g., Video Connecting Distance) could benefit from additional qualitative examples to illustrate their perceptual meaning. - Minor Writing Artifacts: Occasional typographical spacing and minor stylistic inconsistencies could be refined.

Reviewer 03Rating 4Confidence 4

Strengths

- The paper formalizes the Video Connecting task, which, while related to existing video generation problems, presents a non-trivial challenge. The paper provides a valuable comparison by adapting and evaluating several recent state-of-the-art video generation models for this new task. - The paper offers a detailed pipeline for the VC-Bench dataset construction and the calculation of the proposed evaluation metrics.

Weaknesses

- The long-term impact of the benchmark may be limited, as Video Connecting could be viewed as a niche or minor task rather than a foundational problem. There is significant overlap with existing video generation, extension, or interpolation tasks, and few works are specifically dedicated to this problem, which may limit the benchmark's adoption. - The dataset construction pipeline (e.g., scene detection, clip filtering, captioning) and core evaluation metrics (particularly the Video Quality Sc

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis