CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding

Emanuele Vivoli; Marco Bertini; Dimosthenis Karatzas

arXiv:2407.03550·cs.CV·November 1, 2024

CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding

Emanuele Vivoli, Marco Bertini, Dimosthenis Karatzas

PDF

Open Access 1 Repo 1 Video

TL;DR

CoMix is a new comprehensive benchmark dataset designed to evaluate multi-task comic understanding models across diverse tasks, styles, and settings, highlighting current performance gaps and fostering future advancements.

Contribution

The paper introduces CoMix, a multi-task benchmark with expanded annotations and diverse styles, enabling evaluation of models' transferability and multi-task capabilities in comic analysis.

Findings

01

Significant performance gap between humans and models.

02

Models struggle with multi-task and style transfer.

03

Benchmark promotes progress in comic understanding.

Abstract

The comic domain is rapidly advancing with the development of single-page analysis and synthesis models. However, evaluation metrics and datasets lag behind, often limited to small-scale or single-style test sets. We introduce a novel benchmark, CoMix, designed to evaluate the multi-task capabilities of models in comic analysis. Unlike existing benchmarks that focus on isolated tasks such as object detection or text recognition, CoMix addresses a broader range of tasks including object detection, speaker identification, character re-identification, reading order, and multi-modal reasoning tasks like character naming and dialogue generation. Our benchmark comprises three existing datasets with expanded annotations to support multi-task evaluation. To mitigate the over-representation of manga-style data, we have incorporated a new dataset of carefully selected American comic-style books,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emanuelevivoli/CoMix
pytorchOfficial

Videos

CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding· slideslive

Taxonomy

TopicsComics and Graphic Narratives · Artificial Intelligence in Games · Educational Games and Gamification

Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · Focus