Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes
Christoph Wick, Jochen Z\"ollner, Tobias Gr\"uning

TL;DR
This paper introduces a hybrid decoding method combining CTC and sequence-to-sequence models for handwritten text recognition, improving accuracy and efficiency by penalizing invalid paths during beam search.
Contribution
It proposes a novel CTC-Prefix-Score integration into S2S decoding, reducing model complexity while enhancing recognition performance.
Findings
Achieved 2.95% CER on IAM dataset with synthetic pretraining and language model
Model requires 10-20 times fewer parameters than state-of-the-art methods
Demonstrated competitive results across three HTR datasets
Abstract
In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Softmax · Dropout · Dense Connections · Layer Normalization · Absolute Position Encodings
