Toward accessible comics for blind and low vision readers

Christophe Rigaud (L3I); Jean-Christophe Burie (L3I); Samuel Petit; (Comix AI)

arXiv:2407.08248·cs.AI·September 11, 2024

Toward accessible comics for blind and low vision readers

Christophe Rigaud (L3I), Jean-Christophe Burie (L3I), Samuel Petit, (Comix AI)

PDF

TL;DR

This paper proposes a method combining computer vision, OCR, and prompt engineering to generate detailed, context-aware text descriptions of comic strips, aiming to improve accessibility for blind and low vision readers.

Contribution

It introduces a novel approach that integrates visual content analysis with language models to produce comprehensive comic descriptions for accessibility.

Findings

01

Effective extraction of comic content features

02

Generation of detailed, context-aware descriptions

03

Potential to enhance audiobook and eBook accessibility

Abstract

This work explores how to fine-tune large language models using prompt engineering techniques with contextual information for generating an accurate text description of the full story, ready to be forwarded to off-the-shelve speech synthesis tools. We propose to use existing computer vision and optical character recognition techniques to build a grounded context from the comic strip image content, such as panels, characters, text, reading order and the association of bubbles and characters. Then we infer character identification and generate comic book script with context-aware panel description including character's appearance, posture, mood, dialogues etc. We believe that such enriched content description can be easily used to produce audiobook and eBook with various voices for characters, captions and playing sound effects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.