Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

Zhijiang Tang; Jiaxin Qi; Bing Zhao; Jianqiang Huang

arXiv:2604.17428·cs.CV·April 21, 2026

Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

Zhijiang Tang, Jiaxin Qi, Bing Zhao, Jianqiang Huang

PDF

TL;DR

This paper introduces Long-CODE, a new framework and benchmark for evaluating long-video generation quality, focusing on long-range attributes like narrative consistency, which existing short-video metrics overlook.

Contribution

The paper proposes a novel long-video evaluation metric based on shot dynamics and introduces a dedicated dataset with human annotations for long-range video assessment.

Findings

01

The new metric correlates highly with human judgments.

02

Existing short-video metrics are insensitive to structural long-range inconsistencies.

03

Long-CODE provides a comprehensive benchmark for long-video evaluation.

Abstract

As video generation models achieve unprecedented capabilities, the demand for robust video evaluation metrics becomes increasingly critical. Traditional metrics are intrinsically tailored for short-video evaluation, predominantly assessing frame-level visual quality and localized temporal smoothness. However, as state-of-the-art video generation models scale to generate longer videos, these metrics fail to capture essential long-range characteristics, such as narrative richness and global causal consistency. Recognizing that short-term visual perception and long-context attributes are fundamentally orthogonal dimensions, we argue that long-video metrics should be disentangled from short-video assessments. In this paper, we focus on the rigorous justification and design of a dedicated framework for long-video evaluation. We first introduce a suite of long-video attribute corruption…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.