CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding
Emanuele Vivoli, Marco Bertini, Dimosthenis Karatzas

TL;DR
CoMix is a new comprehensive benchmark dataset designed to evaluate multi-task comic understanding models across diverse tasks, styles, and settings, highlighting current performance gaps and fostering future advancements.
Contribution
The paper introduces CoMix, a multi-task benchmark with expanded annotations and diverse styles, enabling evaluation of models' transferability and multi-task capabilities in comic analysis.
Findings
Significant performance gap between humans and models.
Models struggle with multi-task and style transfer.
Benchmark promotes progress in comic understanding.
Abstract
The comic domain is rapidly advancing with the development of single-page analysis and synthesis models. However, evaluation metrics and datasets lag behind, often limited to small-scale or single-style test sets. We introduce a novel benchmark, CoMix, designed to evaluate the multi-task capabilities of models in comic analysis. Unlike existing benchmarks that focus on isolated tasks such as object detection or text recognition, CoMix addresses a broader range of tasks including object detection, speaker identification, character re-identification, reading order, and multi-modal reasoning tasks like character naming and dialogue generation. Our benchmark comprises three existing datasets with expanded annotations to support multi-task evaluation. To mitigate the over-representation of manga-style data, we have incorporated a new dataset of carefully selected American comic-style books,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComics and Graphic Narratives · Artificial Intelligence in Games · Educational Games and Gamification
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Focus
