TL;DR
This paper investigates character-based handwritten text recognition using attention networks, comparing different attention mechanisms and demonstrating the importance of precise alignment for transcription accuracy.
Contribution
It introduces an analysis of softmax and sigmoid attention mechanisms in character-based HTR and highlights the impact of alignment precision on transcription performance.
Findings
Softmax attention provides more precise character alignment.
Sigmoid attention tends to focus on multiple characters, less precise.
Linear attention weights lead to poor performance due to lack of alignment.
Abstract
The paper approaches the task of handwritten text recognition (HTR) with attentional encoder-decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavily on individual characters, while sigmoid attention focuses on multiple characters at each step of the decoding. When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise. When a linear function is used to obtain attention weights, the model predicts a character by looking at the entire sequence of characters and performs poorly because it lacks a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
