Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names
Ragav Sachdeva, Gyungin Shin, Andrew Zisserman

TL;DR
This paper presents Magiv2, a model that automatically generates complete manga chapter transcripts with character names and dialogue attribution, improving accessibility for visually impaired readers.
Contribution
It introduces Magiv2 for high-precision manga transcript generation, extends the PopManga dataset with detailed annotations, and provides a large character bank dataset for improved character identification.
Findings
Magiv2 achieves higher speaker diarisation accuracy than previous methods.
Extended PopManga dataset includes detailed annotations for better dialogue understanding.
Character bank dataset contains over 11K characters with images and chapter appearances.
Abstract
Enabling engagement of manga by visually impaired individuals presents a significant challenge due to its inherently visual nature. With the goal of fostering accessibility, this paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically, with a particular emphasis on ensuring narrative consistency. This entails identifying (i) what is being said, i.e., detecting the texts on each page and classifying them into essential vs non-essential, and (ii) who is saying it, i.e., attributing each dialogue to its speaker, while ensuring the same characters are named consistently throughout the chapter. To this end, we introduce: (i) Magiv2, a model that is capable of generating high-quality chapter-wide manga transcripts with named characters and significantly higher precision in speaker diarisation over prior works; (ii) an extension of the PopManga…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComics and Graphic Narratives · Digital Games and Media · Artificial Intelligence in Games
