VC-Bench: Pioneering the Video Connecting Benchmark with a Dataset and Evaluation Metrics
Zhiyu Yin, Zhipeng Liu, Kehai Chen, Lemao Liu, Jin Liu, Hong-Dong Li, Yang Xiang, Min Zhang

TL;DR
VC-Bench introduces a comprehensive benchmark dataset and evaluation metrics for the emerging task of video connecting, addressing the need for standardized assessment of smooth intermediate video generation between clips.
Contribution
This work presents VC-Bench, a new dataset and evaluation framework specifically designed for the video connecting task, filling a critical gap in standardized benchmarking.
Findings
Current models struggle with start-end consistency
Transition smoothness remains a significant challenge
Benchmark reveals limitations in existing video generation methods
Abstract
While current video generation focuses on text or image conditions, practical applications like video editing and vlogging often need to seamlessly connect separate clips. In our work, we introduce Video Connecting, an innovative task that aims to generate smooth intermediate video content between given start and end clips. However, the absence of standardized evaluation benchmarks has hindered the development of this task. To bridge this gap, we proposed VC-Bench, a novel benchmark specifically designed for video connecting. It includes 1,579 high-quality videos collected from public platforms, covering 15 main categories and 72 subcategories to ensure diversity and structure. VC-Bench focuses on three core aspects: Video Quality Score VQS, Start-End Consistency Score SECS, and Transition Smoothness Score TSS. Together, they form a comprehensive framework that moves beyond conventional…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The proposed task is highly valuable and meaningful for future research. 2. This paper is well-organized and easy to read.
1. The paper needs a more in-depth comparative analysis beyond the task definition to clarify what specific innovations have been made in constructing this benchmark, especially compared with the existing First-Last Frame to Video task. 2. Compared with the existing First-Last Frame to Video task, what different requirements does the video connecting task impose on the generation model? Are there corresponding experiment results to support this in the evaluation?
- Novel Task Definition: Clear formulation of the Video Connecting task as a distinct challenge, bridging isolated generation and temporal continuity - Comprehensive Benchmark Design: A well-curated dataset with rigorous filtering, aesthetic scoring, and scene detection ensures quality and diversity.
- Model Diversity: Evaluation excludes closed-source systems (e.g., Sora, Runway Gen-3) that might exhibit different performance trends. - Metric Interpretability: Some metrics (e.g., Video Connecting Distance) could benefit from additional qualitative examples to illustrate their perceptual meaning. - Minor Writing Artifacts: Occasional typographical spacing and minor stylistic inconsistencies could be refined.
- The paper formalizes the Video Connecting task, which, while related to existing video generation problems, presents a non-trivial challenge. The paper provides a valuable comparison by adapting and evaluating several recent state-of-the-art video generation models for this new task. - The paper offers a detailed pipeline for the VC-Bench dataset construction and the calculation of the proposed evaluation metrics.
- The long-term impact of the benchmark may be limited, as Video Connecting could be viewed as a niche or minor task rather than a foundational problem. There is significant overlap with existing video generation, extension, or interpolation tasks, and few works are specifically dedicated to this problem, which may limit the benchmark's adoption. - The dataset construction pipeline (e.g., scene detection, clip filtering, captioning) and core evaluation metrics (particularly the Video Quality Sc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis
