TL;DR
This paper introduces an end-to-end neural model that simultaneously performs text localization, transcription, and named entity recognition on full document pages, streamlining information extraction processes.
Contribution
It presents a novel unified model combining object detection and recognition tasks, enabling joint learning and execution of multiple document analysis tasks in a single step.
Findings
Model effectively performs joint text detection, transcription, and NER.
Shared features improve performance across tasks.
Compared to sequential methods, the approach offers efficiency and potential accuracy benefits.
Abstract
In the last years, the consolidation of deep neural network architectures for information extraction in document images has brought big improvements in the performance of each of the tasks involved in this process, consisting of text localization, transcription, and named entity recognition. However, this process is traditionally performed with separate methods for each task. In this work we propose an end-to-end model that combines a one stage object detection network with branches for the recognition of text and named entities respectively in a way that shared features can be learned simultaneously from the training error of each of the tasks. By doing so the model jointly performs handwritten text detection, transcription, and named entity recognition at page level with a single feed forward step. We exhaustively evaluate our approach on different datasets, discussing its advantages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
