The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva, Andrew Zisserman

TL;DR
This paper introduces Magi, a unified model that automatically detects panels, clusters characters, associates dialogues, and generates transcriptions for manga, making comics more accessible to visually impaired individuals.
Contribution
The work presents a novel integrated system for manga diarisation, including detection, clustering, and transcription, along with an annotated benchmark dataset.
Findings
Magi effectively detects panels, text, and character boxes.
It accurately clusters characters without prior knowledge of cluster count.
The system successfully generates readable dialogue transcriptions.
Abstract
In the past few decades, Japanese comics, commonly referred to as Manga, have transcended both cultural and linguistic boundaries to become a true worldwide sensation. Yet, the inherent reliance on visual cues and illustration within manga renders it largely inaccessible to individuals with visual impairments. In this work, we seek to address this substantial barrier, with the aim of ensuring that manga can be appreciated and actively engaged by everyone. Specifically, we tackle the problem of diarisation i.e. generating a transcription of who said what and when, in a fully automatic way. To this end, we make the following contributions: (1) we present a unified model, Magi, that is able to (a) detect panels, text boxes and character boxes, (b) cluster characters by identity (without knowing the number of clusters apriori), and (c) associate dialogues to their speakers; (2) we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Natural Language Processing Techniques · Multimodal Machine Learning Applications
