Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker   Detection

Yingxuan Li; Kiyoharu Aizawa; Yusuke Matsui

arXiv:2306.17469·cs.CV·April 23, 2024·5 cites

Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection

Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

PDF

Open Access 2 Repos

TL;DR

This paper introduces Manga109Dialog, the largest dataset for comics speaker detection, and proposes a scene graph-based deep learning method that outperforms existing approaches with over 75% accuracy.

Contribution

The paper presents the creation of Manga109Dialog, a large-scale annotated dataset, and a novel scene graph-based deep learning method for comics speaker detection.

Findings

01

The proposed method achieves over 75% accuracy.

02

Manga109Dialog contains 132,692 speaker-text pairs.

03

Scene graph models outperform distance-based methods.

Abstract

The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characters' personalities, and inference of character relationships and stories. To deal with the problem of insufficient speaker-to-text annotations, we created a new annotation dataset Manga109Dialog based on Manga109. Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. We further divided our dataset into different levels by prediction difficulties to evaluate speaker detection methods more appropriately. Unlike existing methods mainly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Translation Studies and Practices · Comics and Graphic Narratives