Re:Verse -- Can Your VLM Read a Manga?

Aaditya Baranwal; Madhav Kataria; Naitik Agrawal; Yogesh S Rawat; Shruti Vyas

arXiv:2508.08508·cs.CV·April 23, 2026

Re:Verse -- Can Your VLM Read a Manga?

Aaditya Baranwal, Madhav Kataria, Naitik Agrawal, Yogesh S Rawat, Shruti Vyas

PDF

2 Repos 1 Datasets

TL;DR

This paper investigates the limitations of current Vision Language Models in understanding manga narratives, highlighting their struggles with temporal reasoning, character consistency, and story coherence across extended sequences.

Contribution

It introduces a novel evaluation framework combining multimodal annotation, cross-modal analysis, and retrieval methods to systematically assess narrative understanding in VLMs.

Findings

01

Current models excel at panel interpretation but fail at causal and temporal reasoning.

02

The framework applied to Re:Zero manga reveals significant gaps in story-level comprehension.

03

Provides actionable insights and a foundation for future narrative intelligence evaluation.

Abstract

Current Vision Language Models (VLMs) demonstrate a critical gap between surface-level recognition and deep narrative reasoning when processing sequential visual storytelling. Through a comprehensive investigation of manga narrative understanding, we reveal that while recent large multimodal models excel at individual panel interpretation, they systematically fail at temporal causality and cross-panel cohesion, core requirements for coherent story comprehension. We introduce a novel evaluation framework that combines fine-grained multimodal annotation, cross-modal embedding analysis, and retrieval-augmented assessment to systematically characterize these limitations. Our methodology includes (i) a rigorous annotation protocol linking visual elements to narrative structure through aligned light novel text, (ii) comprehensive evaluation across multiple reasoning paradigms, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

sochastic/Re-Verse
dataset· 30 dl
30 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.