Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
Zilyu Ye, Jinxiu Liu, Ruotian Peng, Jinjin Cao, Zhiyang Chen, Yiyang, Zhang, Ziwei Xuan, Mingyuan Zhou, Xiaoqian Shen, Mohamed Elhoseiny, Qi Liu,, Guo-Jun Qi

TL;DR
Openstory++ introduces a large-scale, instance-aware dataset and benchmark for open-domain visual storytelling, enabling models to generate consistent, high-quality narratives across complex, multi-instance visual data.
Contribution
It provides a novel dataset with instance-level annotations and a new benchmark framework for evaluating long-context multimodal generation tasks.
Findings
Openstory++ outperforms previous datasets in visual storytelling quality.
Models trained on Openstory++ show improved consistency in multi-instance scenarios.
Cohere-Bench effectively evaluates models on long-context multimodal tasks.
Abstract
Recent image generation models excel at creating high-quality images from brief captions. However, they fail to maintain consistency of multiple instances across images when encountering lengthy contexts. This inconsistency is largely due to in existing training datasets the absence of granular instance feature labeling in existing training datasets. To tackle these issues, we introduce Openstory++, a large-scale dataset combining additional instance-level annotations with both images and text. Furthermore, we develop a training methodology that emphasizes entity-centric image-text generation, ensuring that the models learn to effectively interweave visual and textual information. Specifically, Openstory++ streamlines the process of keyframe extraction from open-domain videos, employing vision-language models to generate captions that are then polished by a large language model for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Storytelling and Education · Video Analysis and Summarization · Artificial Intelligence in Games
