# Deep CNN-based Speech Balloon Detection and Segmentation for Comic Books

**Authors:** David Dubray, Jochen Laubrock

arXiv: 1902.08137 · 2019-02-22

## TL;DR

This paper presents a deep learning approach using a U-Net inspired CNN with VGG-16 encoder for accurate detection and segmentation of speech balloons in comic books, outperforming previous methods.

## Contribution

It introduces a novel fully convolutional neural network model tailored for speech balloon segmentation, achieving state-of-the-art accuracy and robustness to complex shapes.

## Key findings

- F1-score over 0.94 on the Graphic Narrative Corpus
- Effective segmentation of curved and irregular speech balloons
- Model distinguishes balloons from captions accurately

## Abstract

We develop a method for the automated detection and segmentation of speech balloons in comic books, including their carrier and tails. Our method is based on a deep convolutional neural network that was trained on annotated pages of the Graphic Narrative Corpus. More precisely, we are using a fully convolutional network approach inspired by the U-Net architecture, combined with a VGG-16 based encoder. The trained model delivers state-of-the-art performance with an F1-score of over 0.94. Qualitative results suggest that wiggly tails, curved corners, and even illusory contours do not pose a major problem. Furthermore, the model has learned to distinguish speech balloons from captions. We compare our model to earlier results and discuss some possible applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08137/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1902.08137/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/1902.08137/full.md

---
Source: https://tomesphere.com/paper/1902.08137