Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection
Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

TL;DR
This paper introduces Manga109Dialog, the largest dataset for comics speaker detection, and proposes a scene graph-based deep learning method that outperforms existing approaches with over 75% accuracy.
Contribution
The paper presents the creation of Manga109Dialog, a large-scale annotated dataset, and a novel scene graph-based deep learning method for comics speaker detection.
Findings
The proposed method achieves over 75% accuracy.
Manga109Dialog contains 132,692 speaker-text pairs.
Scene graph models outperform distance-based methods.
Abstract
The expanding market for e-comics has spurred interest in the development of automated methods to analyze comics. For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words. Comics speaker detection research has practical applications, such as automatic character assignment for audiobooks, automatic translation according to characters' personalities, and inference of character relationships and stories. To deal with the problem of insufficient speaker-to-text annotations, we created a new annotation dataset Manga109Dialog based on Manga109. Manga109Dialog is the world's largest comics speaker annotation dataset, containing 132,692 speaker-to-text pairs. We further divided our dataset into different levels by prediction difficulties to evaluate speaker detection methods more appropriately. Unlike existing methods mainly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · Comics and Graphic Narratives
