ViStoryBench: Comprehensive Benchmark Suite for Story Visualization

Cailin Zhuang; Ailin Huang; Yaoqi Hu; Jingwei Wu; Wei Cheng; Jiaqi Liao; Hongyuan Wang; Xinyao Liao; Weiwei Cai; Hengyuan Xu; Xuanyang Zhang; Xianfang Zeng; Zhewei Huang; Gang Yu; Chi Zhang

arXiv:2505.24862·cs.CV·March 31, 2026

ViStoryBench: Comprehensive Benchmark Suite for Story Visualization

Cailin Zhuang, Ailin Huang, Yaoqi Hu, Jingwei Wu, Wei Cheng, Jiaqi Liao, Hongyuan Wang, Xinyao Liao, Weiwei Cai, Hengyuan Xu, Xuanyang Zhang, Xianfang Zeng, Zhewei Huang, Gang Yu, Chi Zhang

PDF

1 Repo

TL;DR

ViStoryBench is a comprehensive benchmark suite for evaluating story visualization models across diverse narratives, styles, and characters, addressing limitations of existing benchmarks.

Contribution

It introduces a richly annotated, multi-dimensional benchmark with automated metrics and human verification, enabling systematic evaluation of story visualization models.

Findings

01

Validated metrics through human studies.

02

Assessed a broad range of models systematically.

03

Highlighted strengths and weaknesses of current models.

Abstract

Story visualization aims to generate coherent image sequences that faithfully represent a narrative and match given character references. Despite progress in generative models, existing benchmarks remain narrow in scope, often limited to short prompts, lacking character references, or single-image cases, failing to reflect real-world narrative complexity and obscuring true model performance.We introduce ViStoryBench, a comprehensive benchmark designed to evaluate story visualization models across varied narrative structures, visual styles, and character settings. It features richly annotated multi-shot scripts derived from curated stories spanning literature, film, and folklore. Large language models assist in story summarization and script generation, with all outputs verified by humans for coherence and fidelity. Character references are carefully curated to maintain consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vistorybench/vistorybench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.