# Labeling, Cutting, Grouping: an Efficient Text Line Segmentation Method   for Medieval Manuscripts

**Authors:** Michele Alberti, Lars V\"ogtlin, Vinaychandran Pondenkandath, Mathias, Seuret, Rolf Ingold, and Marcus Liwicki

arXiv: 1906.11894 · 2019-07-02

## TL;DR

This paper presents a novel text-line segmentation method for medieval manuscripts that combines deep-learning pre-classification with semantic segmentation, significantly improving accuracy on challenging handwritten documents.

## Contribution

It introduces a new approach using semantic pixel segmentation as a pre-processing step and a robust algorithm for high-accuracy text-line extraction in complex manuscripts.

## Key findings

- Reduced segmentation error by 80.7% on challenging datasets
- Achieved 99.42% line IU accuracy on medieval manuscripts
- Demonstrated effectiveness across various scripts and datasets

## Abstract

This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleed-through, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the high-quality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.11894/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/1906.11894/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1906.11894/full.md

---
Source: https://tomesphere.com/paper/1906.11894