Faster DAN: Multi-target Queries with Document Positional Encoding for   End-to-end Handwritten Document Recognition

Denis Coquenet; Cl\'ement Chatelain; Thierry Paquet

arXiv:2301.10593·cs.CV·August 31, 2023

Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Denis Coquenet, Cl\'ement Chatelain, Thierry Paquet

PDF

1 Repo

TL;DR

Faster DAN introduces a parallelized approach to handwritten document recognition by predicting initial characters and completing text lines simultaneously, significantly reducing inference time while maintaining competitive accuracy.

Contribution

The paper presents a novel two-step method with document positional encoding that accelerates end-to-end handwritten document recognition.

Findings

01

Achieves at least 4x faster inference on multiple datasets.

02

Maintains competitive recognition accuracy.

03

Effective parallelization of text line completion.

Abstract

Recent advances in handwritten text recognition enabled to recognize whole documents in an end-to-end way: the Document Attention Network (DAN) recognizes the characters one after the other through an attention-based prediction process until reaching the end of the document. However, this autoregressive process leads to inference that cannot benefit from any parallelization optimization. In this paper, we propose Faster DAN, a two-step strategy to speed up the recognition process at prediction time: the model predicts the first character of each text line in the document, and then completes all the text lines in parallel through multi-target queries and a specific document positional encoding scheme. Faster DAN reaches competitive results compared to standard DAN, while being at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

factodeeplearning/fasterdan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings