Unlocking Comics: The AI4VA Dataset for Visual Understanding

Peter Gr\"onquist; Deblina Bhattacharjee; Bahar Aydemir; Baran; Ozaydin; Tong Zhang; Mathieu Salzmann; Sabine S\"usstrunk

arXiv:2410.20459·cs.CV·October 29, 2024

Unlocking Comics: The AI4VA Dataset for Visual Understanding

Peter Gr\"onquist, Deblina Bhattacharjee, Bahar Aydemir, Baran, Ozaydin, Tong Zhang, Mathieu Salzmann, Sabine S\"usstrunk

PDF

Open Access 1 Repo

TL;DR

This paper introduces a comprehensive, multi-task dataset of 1950s Franco-Belgian comics, designed to advance visual understanding and digital humanities applications through diverse annotations and styles.

Contribution

It provides a novel, multi-style comic dataset with annotations for depth, segmentation, saliency, and character recognition, filling a gap in multimodal and artistic data resources.

Findings

01

Dataset enables multi-task learning for comics analysis

02

Supports digital art and storytelling innovation

03

Facilitates research in computational creativity

Abstract

In the evolving landscape of deep learning, there is a pressing need for more comprehensive datasets capable of training models across multiple modalities. Concurrently, in digital humanities, there is a growing demand to leverage technology for diverse media adaptation and creation, yet limited by sparse datasets due to copyright and stylistic constraints. Addressing this gap, our paper presents a novel dataset comprising Franco-Belgian comics from the 1950s annotated for tasks including depth estimation, semantic segmentation, saliency detection, and character identification. It consists of two distinct and consistent styles and incorporates object concepts and labels taken from natural images. By including such diverse information across styles, this dataset not only holds promise for computational creativity but also offers avenues for the digitization of art and storytelling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ivrl/ai4va
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Comics and Graphic Narratives