Rescoring Sequence-to-Sequence Models for Text Line Recognition with   CTC-Prefixes

Christoph Wick; Jochen Z\"ollner; Tobias Gr\"uning

arXiv:2110.05909·cs.CV·March 30, 2022

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

Christoph Wick, Jochen Z\"ollner, Tobias Gr\"uning

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hybrid decoding method combining CTC and sequence-to-sequence models for handwritten text recognition, improving accuracy and efficiency by penalizing invalid paths during beam search.

Contribution

It proposes a novel CTC-Prefix-Score integration into S2S decoding, reducing model complexity while enhancing recognition performance.

Findings

01

Achieved 2.95% CER on IAM dataset with synthetic pretraining and language model

02

Model requires 10-20 times fewer parameters than state-of-the-art methods

03

Demonstrated competitive results across three HTR datasets

Abstract

In contrast to Connectionist Temporal Classification (CTC) approaches, Sequence-To-Sequence (S2S) models for Handwritten Text Recognition (HTR) suffer from errors such as skipped or repeated words which often occur at the end of a sequence. In this paper, to combine the best of both approaches, we propose to use the CTC-Prefix-Score during S2S decoding. Hereby, during beam search, paths that are invalid according to the CTC confidence matrix are penalised. Our network architecture is composed of a Convolutional Neural Network (CNN) as visual backbone, bidirectional Long-Short-Term-Memory-Cells (LSTMs) as encoder, and a decoder which is a Transformer with inserted mutual attention layers. The CTC confidences are computed on the encoder while the Transformer is only used for character-wise S2S decoding. We evaluate this setup on three HTR data sets: IAM, Rimes, and StAZH. On IAM, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

planet-ai-gmbh/tfaip-hybrid-ctc-s2s
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Softmax · Dropout · Dense Connections · Layer Normalization · Absolute Position Encodings