One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli, Mohamed Ali Souibgui, Andrey Barsky, Artemis, LLabr\'es, Marco Bertini, Dimosthenis Karatzas

TL;DR
This survey reviews the unique challenges and recent advances in Comics Understanding, emphasizing datasets, tasks, and a new framework to guide future vision-language research in this complex domain.
Contribution
It introduces the LoCU framework, a novel taxonomy for comics tasks, and provides a comprehensive analysis of datasets, methods, and future research directions.
Findings
Comics have unique visual and textual structures that challenge existing AI models.
The LoCU framework redefines comics-related vision-language tasks for better understanding.
Current datasets and methods are categorized, highlighting gaps and future opportunities.
Abstract
Vision-language models have recently evolved into versatile systems capable of high performance across a range of tasks, such as document understanding, visual question answering, and grounding, often in zero-shot settings. Comics Understanding, a complex and multifaceted field, stands to greatly benefit from these advances. Comics, as a medium, combine rich visual and textual narratives, challenging AI models with tasks that span image classification, object detection, instance segmentation, and deeper narrative comprehension through sequential panels. However, the unique structure of comics -- characterized by creative variations in style, reading order, and non-linear storytelling -- presents a set of challenges distinct from those in other visual-language domains. In this survey, we present a comprehensive review of Comics Understanding from both dataset and task perspectives. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComics and Graphic Narratives · Language, Metaphor, and Cognition
MethodsSparse Evolutionary Training
