A Neural Model for Text Localization, Transcription and Named Entity   Recognition in Full Pages

Manuel Carbonell; Alicia Forn\'es; Mauricio Villegas; Josep Llad\'os

arXiv:1912.10016·cs.CV·May 5, 2020

A Neural Model for Text Localization, Transcription and Named Entity Recognition in Full Pages

Manuel Carbonell, Alicia Forn\'es, Mauricio Villegas, Josep Llad\'os

PDF

2 Repos

TL;DR

This paper introduces an end-to-end neural model that simultaneously performs text localization, transcription, and named entity recognition on full document pages, streamlining information extraction processes.

Contribution

It presents a novel unified model combining object detection and recognition tasks, enabling joint learning and execution of multiple document analysis tasks in a single step.

Findings

01

Model effectively performs joint text detection, transcription, and NER.

02

Shared features improve performance across tasks.

03

Compared to sequential methods, the approach offers efficiency and potential accuracy benefits.

Abstract

In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.